APPLICATIONS  OF  NEURAL  NETWORKS 

IN 

HIGH  ENERGY  PHYSICS 


BY 

MOHAMMAD  REZA  TAYEBNEJAD 


A  DISSERTATION  PRESENTED  TO  THE  GRADUATE  SCHOOL 
OF  THE  UNIVERSITY  OF  FLORIDA  IN  PARTIAL  FULFILLMENT 
OF  THE  REQUIREMENTS  FOR  THE  DEGREE  OF 
DOCTOR  OF  PHILOSOPHY 

UNIVERSITY  OF  FLORIDA 
1997 


ACKNOWLEDGEMENTS 


Writing  this  thesis  culminates  the  seven  year  period  of  my  study  in  Gainesville. 
During  the  period,  numerous  people  have  provided  me  with  their  support  and 
wisdom,  without  which  I  could  not  have  dreamed  of  accomplishing  what  I  have. 
Among  those,  it  was  my  research  advisor  Professor  Field  who  was  very  influential 
to  my  life  as  well  as  my  work.  I  extend  my  deepest  gratitude  to  him  not  only 
for  his  academic  advisement  but  also  for  his  kindness  and  for  the  patience  he  had 
shown  while  advising  me  in  my  physics  career.  I  like  to  express  my  especial  thank 
to  my  brother  Alireza  who  has  been  very  instrumental  in  my  coming  to  the  United 
States  and  pursuing  my  education.  I  could  not  have  been  where  I  am  without 
the  help  and  support  that  he  gave  me  over  the  years  and  without  the  words  of 
encouragements  he  never  spared  while  guiding  me  through  obstacles.  Last  but  not 
least,  I  thank  my  wife  Farideh  and  my  children  Ali,  Anna,  and  Armin  for  their 
continual  support,  understanding,  and  encouragement. 


ii 


TABLE  OF  CONTENTS 


ACKNOWLEDGMENTS   ii 

ABSTRACT   vi 

CHAPTERS 

1  INTRODUCTION   1 

2  STANDARD  MODEL   7 

2.1  Introduction  to  Gauge  Principles    7 

2.2  Non-Abelian  Extension   9 

2.3  Spontaneously  Broken  Symmetries    12 

2.4  Higgs  Mechanism   14 

2.5  Electroweak  Theory  of  the  Standard  Model   17 

2.6  The  Higgs  Field  and  the  Extended  Lagrangian   25 

2.7  Extension  of  the  Electroweak  Theory  to  Quarks   31 

2.8  Quark  Masses  and  Mixing   33 

2.9  Quantum  Chromodynamics,  QCD   41 

2.10  Properties  of  the  Top  Quark    45 

3  NEURAL  NETWORKS  AND  FISHER  DISCRIMINANTS   47 

3.1  Artificial  Neural  Networks   47 

3.2  Feed-forward  Neural  Structure   50 

3.3  Learning  Methods   53 

3.4  Bias  Input  and  Activation  Function   57 

3.5  Hidden  Layer  Weight  Update    59 

3.6  Local  Minima,  Flat  Spaces,  and  Overfitting   62 

3.7  Statistical  Classifiers   65 

3.8  Normal  Distributions   68 

3.9  Bayes  Classifiers   69 

3.10  Fisher  Discriminants   72 


iii 


4  ENHANCING  THE  HIGGS  BOSON  SIGNAL   77 

4.1  Higgs  Decay  Processes   77 

4.2  Event  Generation  Without  Pile-up    79 

4.3  Data  Selection  and  Cuts   81 

4.4  Network  Analysis  Without  Pile-up    87 

4.5  Fisher  Discriminates   93 

4.6  Network  Cut-off   96 

4.7  Event  Generation  and  Cuts  With  Pile-up   97 

4.8  Network  Performance   102 

5  ENHANCING  THE  TOP  QUARK  SIGNAL    107 

5.1  Top  Decay  Processes  107 

5.2  Event  Generation  109 

5.3  Lepton  Plus  Missing  Transverse  Energy  Trigger  112 

5.4  Calorimeter  Cell  Cuts   114 

5.5  Reconstructing  the  Top-Pair  Invariant  Mass  117 

5.6  Comparing  With  the  Parton-Parton  CM  Energy  120 

5.7  Events  Variables:  Fox- Wolfram  Moments  122 

5.8  Neural  Network  Analysis  126 

5.9  Fisher  Discriminates  129 

6  SUMMARY  AND  CONCLUSION   133 

6.1  Neural  Analysis   133 

6.2  Higgs  Data  Analysis    134 

6.3  Top  Data  Analysis   135 

6.4  Final  Remarks   136 

APPENDICES 

A  H,  W±,  AND  Z  DECAY  MODES   137 

A.l  Z  Decay  into  Lepton  Pair  137 

A. 2  Vector  Bosons  Decay  Modes  142 

A.  3  Higgs  Boson  Decay  Modes  147 

B  NOTATIONS  AND  CONVENTIONS    151 

B.  l   Feynman  Rules   151 

B.2   Decay  Rate   152 

B.3  Traces  And  Contraction  Identities   154 

REFERENCES   157 

iv 


BIOGRAPHICAL  SKETCH   160 


v 


Abstract  of  Dissertation  Presented  to  the  Graduate  School 
of  the  University  of  Florida  in  Partial  Fulfillment  of  the 
Requirements  for  the  Degree  of  Doctor  of  Philosophy 

APPLICATIONS  OF  NEURAL  NETWORKS 

IN 

HIGH  ENERGY  PHYSICS 


By 

Mohammad  Reza  Tayebnejad 
May  1997 

Chairman:  Richard  D.  Field 
Major  Department:  Physics 

The  use  of  Artificial  Neural  Networks  as  a  tool  in  High  Energy  Physics  is 

investigated.   Neural  networks  are  used  to  analyze  data  resulting  from  hadron- 

hadron  collisions.  Colliders  data  containes  many  observables  which  can  be  very 

cumbersome  to  analyze  if  the  traditional  methods  are  applied.  We  present  two 

case  studies  by  applying  the  neural  networks  to  the  production  of  Higgs  bosons 

and  the  Top  quarks.  In  both  cases,  neural  networks  provide  further  enhancement 

in  the  identification  of  the  signal  processes  over  the  background  processes.  In 

the  investigation  of  the  Higgs  boson,  neural  networks  are  used  to  help  distinguish 

the  ZZ  ->•  £+£~-jet-jet  signal  produced  by  the  decay  of  a  400  GeV  Higgs  boson 

at  a  proton-proton  colider  energy  of  15  TeV  from  the  "ordinary"  QCD  Z+jets 

background.  The  ideal  case  where  only  one  event  at  a  time  enters  the  detector 

(no  pile-up)  and  the  case  of  multiple  interactions  per  beam  crossing  (pile-up)  are 

examined.  In  both  cases,  when  used  in  conjunction  with  the  standard  cuts,  neural 

networks  provide  an  additional  signal  to  background  enhancement.  In  addition, 

we  investigated  the  event  signature  of  the  iubbqq  decay  mode  of  the  Top-pair 


vi 


production  in  proton-antiproton  collisions  at  1.8  TeV.  Neural  networks  and  Fisher 
discriminants  are  used  in  conjunction  with  modified  Fox- Wolfram  "shape"  variables 
to  help  distinguish  the  Top-pair  signal  from  the  W+jets  and  bb+jets  background. 
Instead  of  requiring  at  least  four  jets  in  the  event,  we  find  that  it  is  faster  and 
better  to  simply  cut  on  the  number  of  calorimeter  cells  with  transverse  energy 
greater  than  some  minimum.  By  combining  these  cell  cuts  with  the  event  shape 
information,  we  are  able  to  obtain  a  signal  to  background  ratio  of  around  9  while 
keeping  30%  of  the  signal.  This  corresponds  to  a  signal  to  background  enhancement 
of  around  370. 


vii 


CHAPTER  1 
INTRODUCTION 


Artificial  Neural  Network  (ANN)  technology  is  gaining  wide  acceptance  among 
High  Energy  Physicists  as  a  tool  to  handle  complex  data  analysis  and  event  classi- 
fication [1,  2,  3,  4,  5].  This  is  because  advancement  in  accelerator  technology  with 
ever  increasing  energy,  luminosity  and  event  complexity  makes  the  conventional 
methods  insufficient  to  meet  these  complexities.  It  is  not  an  unusual  scenario 
where  a  high  energy  physicist  finds  himself  searching  for  a  few  signal  events  out  of 
billions  of  occurring  interactions.  In  Particle  Physics,  Artificial  Neural  Networks 
can  be  used  to  solve  classification  problems  such  as  particle  identification  or  for 
rejecting  background  and  enhancing  overall  signal  to  background  ratio. 

Relevant  to  this  discussion  is,  of  course,  search  for  two  of  the  most  elusive  Stan- 
dard Model  particles,  the  Top  quark  and  the  Higgs  boson,  which  have  been  very 
difficult  to  observe  due  to  their  immense  number  of  background  events.  In  the  case 
of  Top,  its  discovery  has  been  announced  by  CDF  [6,  7,  8,  9],  but  Neural  Networks 
could  provide  us  with  an  alternative  method  and  perhaps  even  more  evidence  and 
can  help  substantiate  its  existence.  Artificial  Neural  Networks  technology  is  still 
in  its  growing  stage,  and  it  is  far  from  being  called  a  mature  science,  yet  it  has 
already  shown  many  promising  results.  Our  goal,  in  this  thesis,  is  to  investigate 
the  use  of  Neural  Networks  as  a  tool  in  High  Energy  Physics[10]  and  in  particular 
in  the  analysis  of  the  Higgs  boson  and  the  Top  quark  data. 

An  important  challenge  in  high  energy  phenomenology  and  experiment  is  pat- 


1 


2 


tern  recognition  [11].  Non-linear  modeling  and  standard  methods  assuming  linear 
dependencies  are  quite  insufficient.  Other  methods  have  to  be  used  in  order  to 
help  capture  non-linear  as  well  as  linear  dependencies.  There  are  many  different 
methods  around  for  doing  multivariate  statistical  analysis,  and  Artificial  Neural 
Networks  (ANN),  and  Fisher  Discriminants  represent  only  a  small  subset  of  pos- 
sible methods.  In  analyzing  High  Energy  Physics  data  the  standard  procedure  is 
to  make  various  cuts  in  the  observed  kinematical  variables  in  order  to  single  out 
the  desired  collection  of  events  (signal) .  A  specific  selection  of  cuts  corresponds  to 
a  particular  set  of  feature  functions.  This  procedure  is  often  not  very  systematic 
and  quite  tedious.  Ideally,  one  would  like  to  have  an  automated  optimal  choice  of 
a  feature  function  F(x),  which  is  precisely  what  Artificial  Neural  Networks  and 
Fisher  Discriminants  offer. 

Biological  systems  implement  pattern  recognition  computations  via  intercon- 
nections of  large  number  of  processing  cells  called  neurons.  The  adaptability, 
context-sensitivity,  error-tolerance,  large  memory  capacity,  and  real-time  capa- 
bility of  the  human  information  processing  system  motivates  us  to  emulate  this 
computational  mechanism.  It  is  worth  noting  that  speed  for  a  "neuron  operation" 
in  human  brain  is  many  orders  of  magnitude  slower  than  Hardware  implementation 
of  ANN  (human  ~  10_3sec  while  hardware  implementation  ~  10-9  sec),  and  there- 
fore, for  sequential  operations,  a  computer  can  outperform  human  brain.  Despite 
this  disadvantage  in  speed,  the  ability  of  the  brain  at  performing  pattern  recogni- 
tion tasks  is  unsurpassed  by  that  of  ANN  or  statistical  approach.  The  computing 
power  derives  from  the  enormous  number  of  neurons  which  are  present  in  the  brain 
(~  1011),  compared  to  an  ANN  (~  103),  and  on  the  high  level  of  connectivity  of 
the  biological  neuron  which  act  as  a  powerful  Parallel  Distributed  Processor.  The 
resulting  computing  power  for  the  human  brain  can  be  estimated  of  the  order  of 
1015  operations/sec,  with  memory  capacity  of  108  Mbytes.   It  is,  therefore,  de- 


3 


sirable  to  emulate  the  distributed  computing  mechanism  of  the  human  brain  in 
the  form  of  hardware  and  software  implementation  of  a  neural  network  system. 
There  is  an  increasing  interest  in  understanding  the  working  principles  of  neural 
systems  and  to  apply  these  principles  to  information  processing.  In  this  effort, 
scientists  from  different  fields  are  working  together.  Inspired  from  the  biological 
model,  neural  and  evolutionary  algorithms  have  been  developed  and  successfully 
used  for  the  analysis  of  complex  problems.  Together  with  specific  neural  hardware 
development,  such  algorithms  now  begin  to  find  applications  in  many  scientific 
research  fields  such  as  High  Energy  Physics. 

Contrary  to  the  statistical  approach  for  doing  function  fitting  or  pattern  recog- 
nition, Artificial  Neural  Networks  are,  in  general,  non-parametric  and  non-algorithmic. 
Although  there  are,  normally,  more  free  parameters  in  ANN  approach  than  in  the 
conventional  function  fitting  approach,  ANN  models  do  not  make  any  assump- 
tion about  the  parametric  form  of  the  function  they  model.  In  this  regard,  they 
are  more  powerful  than  parametric  methods  that  try  to  fit  reality  into  a  specific 
parametric  form.  However,  non-parametric  methods  like  ANN  contain  more  free 
parameters  and  hence  require  more  training  data  than  parametric  ones  require  for 
fitting,  in  order  to  achieve  good  generalization  performance.  Fortunately,  for  most 
high  energy  problems  one  has  access  to  a  big  data  sample,  making  it  possible  to 
exploit  the  capabilities  of  non-parametric  models  like  ANN.  ANN  has  no  prede- 
fined algorithm  to  follow,  and  the  goal  is  to  teach  them  to  respond  correctly  (i.e., 
classification  problems)  after  being  exposed  to  training  samples.  Therefore,  for  an 
ANN  the  required  amount  of  a  priori  knowledge  and  detailed  knowledge  of  the 
internal  system  operation  is  minimal.  After  training,  the  hope  is  that  the  internal 
(neural)  structure  of  the  ANN  will  "self-organize"  to  enable  extrapolation  when 
faced  with  new  patterns. 

In  terms  of  architecture,  an  Artificial  Neural  Network  can  be  characterized  as 


a  network  of  many  very  simple  processors  (units)  each  possibly  having  a  small 
local  memory.  The  units  are  connected  by  unidirectional  communication  channels 
which  carry  numeric,  as  opposed  to  symbolic,  data.  The  units  operate  only  on 
their  local  data  and  and  on  the  inputs  they  receive  via  the  connection  channels. 
These  artificial  neurons,  which  constitute  the  basic  block  of  the  network,  are  the 
central  processing  element  of  the  network.  What  distinguishes  the  networks  from 
each  other  is  basically  the  processing  method  adopted  for  each  network  unit  and 
the  way  the  basic  units  communicate  with  each  other  (i.e.  Network  Architecture). 
Therefore,  the  two  key  elements  in  designing  an  ANN  are  architecture  of  the  net- 
work and  the  algorithm  used  to  train  a  network.  The  information  learned  is  stored 
in  the  processing  units  in  the  form  of  connection  weights  and  these  weight  pa- 
rameters are  adjusted  on  the  basis  of  presented  patterns.  In  other  words,  neural 
networks  "learn"  from  examples.  The  ability  to  learn  and  the  ability  to  generalize 
characterizes  the  performance  and  versatility  of  a  given  network. 

The  significant  differences  between  Artificial  Neural  Networks  and  conventional 
single  processor  and  sequential  computing  are. 

•  Distributed  associative  recognition  of  complex  structure:  Meaning  that  the 
values  of  the  weights  represent  the  status  of  knowledge  of  the  ANN,  without 
a  specific  association  of  a  piece  of  knowledge  to  a  particular  neuron.  Asso- 
ciation means  that  if  the  ANN  is  shown  a  partial  input,  the  network  will 
choose  the  closest  match  in  memory  to  that  input,  and  generate  an  output 
corresponding  to  the  full  input.  If  the  network  is  autoassociative,  this  results 
in  the  completion  of  the  input  vector.  Thus  ANNs  can  handle  incomplete, 
noisy  or  previously  unseen  data.  This  last  feature  is  called  generalization 
which  is  characteristic  of  non-linear  network  classifiers. 

•  fault  tolerance:  the  data  may  be  incomplete,  inconsistent  or  noisy  and  yet 


5 


can  continue  to  operate  without  substantial  operational  loss.  Destruction  or 
alteration  of  one  or  more  processing  elements  causes  only  a  slight  degradation 
of  the  network  performance.  This  is  a  consequence  of  the  distributed  infor- 
mation storage  which  is  in  contrast  to  the  single  processor  computer  where 
the  failure  of  a  single  element  causes  a  failure  of  a  the  entire  computational 
process. 

•  the  systems  are  trainable,  and  can  learn  as  well  as  organize  themselves:  Con- 
trary to  algorithmic  method  where  a  priory  knowledge  of  the  process  is  nec- 
essary for  its  success,  ANNs  do  not  make  any  assumption  about  the  outcome. 
Network  is  capable  of  learning  and  adapting  on  the  bases  of  the  pattern  seen 
in  the  training  sample. 

•  algorithms  and  hardware  are  inherently  parallel:  Pattern  Recognition  tasks 
require  the  ability  to  match  large  amount  of  input  information  simultaneously 
and  to  generate  categorical  or  generalized  output.  The  ANN  is  intrinsically  a 
massive  parallel  computational  device.  The  idea  is  to  emulate  human  brain 
where  its  power  is  in  shear  number  of  neurons  connected  in  parallel  and  not 
in  its  speed. 

The  method  of  error  backpropagation  or  backprop[12,  13]  is  the  most  widely 
used  learning  method  for  Neural  Networks  today.  Although  it  has  a  number  of 
disadvantages,  it  has  been  successful  in  practical  applications  and  is  relatively 
easy  to  apply.  Back-propagation  needs  a  "teacher"  that  knows  the  correct  output 
for  any  input  ("supervised  learning")  and  uses  gradient  descent  on  the  error  (as 
provided  by  the  teacher)  to  train  the  weights.  The  use  of  a  gradient  descent 
algorithm  to  train  its  weights  makes  it  slow  to  train;  but  being  a  feed-forward 
algorithm,  it  is  quite  rapid  during  the  recall  phase. 


6 


Feed-forward  nets  are  a  subset  of  the  class  of  nonlinear  regression  and  discrim- 
ination models.  They  are  capable  of  recognizing  complex  patterns  such  as  disjoint 
clusters  of  signals  from  underlying  background.  In  High  Energy  Physics,  the  com- 
plexity arises  from  the  shear  number  of  the  variables  involved  in  the  study  of  the 
physical  phenomenon.  For  such  a  complex  variable  space,  independent  analysis  of 
the  variables,  in  order  to  single  out  desired  process,  falls  short.  A  more  systematic 
approach  is  required,  and  Feed- forward  Neural  Networks  provide  a  coherent  and 
collective  approach  to  deal  with  these  complexities. 

In  this  thesis,  we  investigate  the  use  of  Artificial  Neural  Networks  and  Fisher 
Discriminants  in  High  Energy  Physics.  We  apply  Neural  Networks  to  analyze  data 
resulting  from  hadron-hadron  collisions.  We  also  show  that  Fisher  Discriminants 
can  be  used  in  some  limited  cases.  To  demonstrate  the  capabilities  of  ANN,  we 
use  Feed-forward  Neural  Networks  in  Higgs  boson  and  Top  quark  phenomenology. 
In  both  cases,  Neural  Networks  provide  further  enhancement  in  the  identification 
of  the  signal  subprocesses  over  the  background. 


CHAPTER  2 
STANDARD  MODEL 


The  main  focus  of  this  thesis  is  to  investigate  the  possibility  of  using  new  meth- 
ods in  the  search  for  the  Higgs  boson  and  the  Top  quark.  We  have  not,  however, 
explained  why  we  believe  these  particles  exist  in  the  first  place.  It  is  therefore,  nec- 
essary to  discuss  the  underlying  theory,  Electro- Weak  Theory  of  Standard  Model, 
in  which  these  particles  appear  as  parameters  necessary  for  the  consistency  and 
renormalisability  of  the  theory.  It  is  important  to  note  that  these  mass  terms 
that  appear  in  the  context  of  the  minimal  standard  model  along  with  other  16 
parameters  are  just  free  parameters  that  are  left  to  be  determined  by  experiment. 
These  parameters  are  not  predicted  explicitly  in  the  context  of  the  model.  In  this 
chapter,  therefore,  we  shall  review  the  gauge  theory  of  the  weak  interactions  and 
briefly  discuss  its  extension  to  SU(3)  of  strong  interactions  [14,  15]. 


In  the  context  of  the  gauge  theory  one  tries  to  understand  or  perhaps  guess  the 
dynamics  of  the  forces  involved  in  a  process  based  on  certain  gauge  principles.  To 
demonstrate  this,  we  start  with  an  example  in  quantum  mechanics. 

In  quantum  mechanics  a  state  is  described  by  a  complex  Schrodinger  wave 
function  ip(x).  An  observable  is  computed 


2.1    Introduction  to  Gauge  Principles 


(2.1) 


7 


8 


which  is  clearly  invariant  under  the  global  phase  transition 

\l){x)  -»  eia^(x).  (2.2) 

Hence,  under  the  above  global  transformation,  the  absolute  phase  of  the  wave 
function  cannot  be  determined.  The  relative  phase  between  wave  functions  as 
measured  in  interference  experiments  is  not  affected.  A  problem  arises  when  we 
try  to  impose  different  phase  transformations  in  different  places  in  the  space 

i/>(x)  ->  rl)'{x)  =  eia(a)V(z)-  (2-3) 

The  equations  of  motion  in  quantum  mechanics  normally  involve  derivatives  of  the 
wave  function.  Under  local  phase  rotation  these  derivative  terms  transform  as 

d^{x)  ->  d^'(x)  =  eia^[d^(x)  +  i(d,a(x))4>(x)}.  (2.4) 

The  additional  gradient  of  the  phase  term  spoils  local  phase  invariance.  Local 
phase  invariance  may  be  achieved,  however,  if  the  equations  of  the  motion  and  the 
observables  involving  derivatives  are  modified  by  the  introduction  of  a  field,  A^x). 
If  the  gradient      is  replaced  everywhere  by  the  "covariant  derivative" 

2?„  =  d„  +  leA^.  (2.5) 

The  field  A^x)  transforms  under  the  above  local  phase  rotation  as 

AJx)  ->  A'^x)  =  A,(x)  -  (-e)dM*)-  (2-6) 

Then,  it  can  be  shown  that  under  the  local  phase  transformation: 

V^(x)  ->  eia^V^(x).  (2.7) 

This  guarantees  that  quantities  like  ^>*Vllip{x)  are  invariant  under  local  phase 
rotations.  This  example  shows  that  by  resorting  to  local  gauge  invariance  we  are 
forced  to  introduce  the  electromagnetic  field  and  in  the  process  the  dynamics  of 
the  field  and  its  coupling  to  the  matter  field  are  suggested. 


9 


2.2    Non-Abelian  Extension 

In  the  previous  discussion  we  extended  global  U{1)  rotation  into  local  trans- 
formation and  the  consequence  was  the  introduction  of  the  new  field,  A^.  We  now 
try  to  extend  local  gauge  invariance  to  non-abelian  group  of  transformations.  We 
proceed  by  developing  the  SU(2)  gauge  theory  introduced  first  by  Yang  and  Mills. 
For  this  part  we  consider  a  specific  example  of  Dirac  free  field  Lagrangian: 

C  =  rj>  (r^  -  m)  0,  (2.8) 

where  ^  is  a  Dirac  doublet 


ip(x) 


(2.9) 


The  Lagrangian  above  is  invariant  under  the  global  SU (2)  transformation.  In 
analogy  to  Abelian  gauge  we  want  to  extend  this  global  invariance  to  local  SU (2) 
invariance.  If  under  a  local  gauge  transformation  the  field  transforms  as 

ip{x)  ->  V'(z)  =  U(x)ip(x),  (2.10) 

where  U{x)  e  SU{2)  for  all  x 

U(x)  =  exp  Qt  •  a{x)j  ,  (2.11) 

and  satisfies 

U{x)U\x)   =   1,  (2.12) 
detW(x)   =   1.  (2.13) 

The  gradient  transforms  as 

W->W(^)  +  (W  (2.14) 


10 


To  achieve  local  gauge  invariance  we  introduce  a  gauge-covariant  derivative 

XV  =  Id»  +  igB^.  (2.15) 

where  i"  is  the  2x2  identity  matrix  and  g  will  be  seen  to  play  the  role  of  the 
interaction  coupling  constant.  The  object  B^  is  the  2x2  matrix  defined  by 


(2.16) 


B3       Bx  -  iB2 
\B,+  iB2  -B3 

where  the  three  gauge  fields  are  B^  =  (B\,  B2,  B3).  To  introduce  the  gauge  field 
we  have  to  replace  the  gradient  term  with  the  gauge-covariant  derivative.  The 
resultant  covariant  derivative  should  transform  as: 

V^^V'^'=U(V»iP).  (2.17) 

From  the  above  requirement  we  can  obtain  the  transformation  property  of  the 
gauge  field: 

=    (8,  +  igB'^' 

=  U{d^)  +  {d»U)^  +  igB'^W) 

ee  Uidp  +  igBJiP 

=  U(d^)  +  igU(B^),  (2.18) 
which  can  be  solved  to  yield  the  condition 

igB'^m)  =  igU(B^)  -  (d»U)il>.  (2.19) 
In  operator  terms  we  can  write 

B'^  =  UB^U-1  +  -(dflU)U-1 

=  ufB^  +  hi-^d^U-1.  (2.20) 


11 


Note  that  the  above  non-abelian  gauge  will  reduce  to  ordinary  electro-magnetic 
gauge  if  we  take  U  to  be  the  U(l)  of  electromagnetic  phase  rotation 

UEM  =  el"a^x\  (2.21) 

where  a(x)  is  the  local  phase.  Applying  the  above  transformation  along  with 
replacing  the  gauge  field  terms  with  the  corresponding  electromagnetic  field,  A^, 
yields 

/  1  ^  1 

Aft     =    UemA1JAem  +  -(dflUEM)^EM 

=   A^-d^a  (2.22) 

The  matrix  B^x)  is  clearly  hermitian  and  traceless  due  to  the  fact  that  Pauli 
matrices  are  traceless  and  Hermitian. 

B,(x)  =  \By  =  Bl(x)  (2.23) 

TtB^x)  =  0.  (2.24) 

We  have  not,  of  course,  gained  anything  so  long  as  the  vector  potential  B^  is 
treated  as  an  external  field.  The  B^  field  must  become  a  dynamical  variable.  In 
order  to  construct  the  part  of  the  Lagrangian  we  will  use  the  electromagnetic  field 
strength  as  model.  Electromagnetic  field  tensor  can  be  written  as 

i>  = (2.25) 
iq 

with       =  &H  +  iqAp  it  becomes 

i>    =    —[(d^  +  iqA^^dv  +  iqAv)} 
%q 

=  dpAv-dvAn  +  iqlA^Av],  (2.26) 

where  the  commutator  vanishes  in  an  Abelian  theory.  This  suggests  that  for  the 
SU (2)  gauge  theory  a  candidate  field-strength  tensor  is  the  form 

*V  =  ~\Pv>  V»}  =  dvB^  -  d^Bv  +  ig[Bv,  BJ.  (2.27) 
l9 


12 


The  components  of  the  field  strength  tensor  are  Ba  (with  a  =  1, 2, 3)  which  satisfy 

jp    p>a  

r  nv  —  J->flv  2  ' 

with 

B%  =  d,Bl  -  dvBl  -  geabcBlBl.  (2.28) 

The  eabc  are  the  structure  constants  of  the  SU(2)  group.  Including  the  new  field- 
strength  tensor  in  our  Lagrangian  density  we  obtain  the  Yang-Mills  Lagrangian 

Cym  =  $  (i^V^  -m)rp-  ^Tr  (F^F^) ,  (2.29) 

which  is  invariant  under  local  gauge  transformations.  The  mass  term  M2BllBii  is 
clearly  incompatible  with  local  gauge  invariance.  We  will  see  in  the  next  section 
how  to  give  mass  to  the  gauge  field  by  breaking  the  underlying  symmetry. 

2.3    Spontaneously  Broken  Symmetries 

We  have  seen  that  the  requirement  of  local  gauge  invariance  leads  to  the  con- 
struction of  the  interacting  field  theories.  This,  however,  is  not  satisfactory  because 
it  requires  that  the  interacting  field  be  massless.  Moreover,  the  theory  is  only  ap- 
plicable to  exact  symmetries  while  nature  exhibits  numerous  symmetries  that  are 
only  approximate.  Among  approximate  symmetries,  several  different  realizations 
are  possible.  The  Lagrangian  may  display  an  imperfect  or  explicitly  broken  symme- 
try, or  it  may  happen  that  the  Lagrangian  is  symmetric,  but  the  physical  vacuum 
does  not  respect  the  symmetry.  In  the  latter  case,  the  symmetry  of  the  Lagrangian 
is  said  to  be  spontaneously  broken.  We  shall  study  the  consequence  of  the  spon- 
taneous symmetry  breakdown  of  the  underlying  Lagrangian  which  destroys  the 
symmetry  of  the  physical  vacuum,  but  at  the  same  time  it  generates  the  observed 
masses  of  the  interacting  vector  fields  via  the  "Higgs  Mechanism" . 


13 


Consider  a  Lagrangian  for  a  self-interacting  complex  scalar  field  </>,  which  may 
be  written  in  the  form 

£  =  liidMiPfa)  +  (dMiPfa)]  -v(4i  +  $  •  (2-30) 

The  Lagrangian  is  invariant  under  the  group  50(2)  of  rotation  in  the  plane.  We 
further  assume  that  the  effective  potential  is  given  by 


v(4>2)  =  W  +  ^IAlfa2)2, 


(2.31) 


2^  T  4' 

where  4>2  =  <f>2  +  <j>\.  We  distinguish  two  cases.  A  positive  value  of  the  parameter 
/i2  >  0  corresponds  to  the  ordinary  case  of  exact  symmetry.  The  unique  vacuum 
occurs  at 

<(/)>-- 


V°7 

and  for  small  field  oscillations  the  Lagrangian  takes  the  form 


(2.32) 


This  is  just  the  Lagrangian  for  a  pair  of  scalar  particles  with  common  mass  \l. 
Thus,  in  this  case,  the  symmetry  of  the  Lagrangian  is  preserved.  The  choice  of 
fi2  <  0,  however,  leads  to  a  spontaneous  breakdown  of  the  50(2)  symmetry.  The 
absolute  minimum  of  the  potential  now  occurs  at 

<(j>>l=-H2/\X\  =  v\ 

which  corresponds  to  a  continuum  of  distinct  vacuum  states  that  are  degenerate 
in  energy.  Let  us  select  as  the  physical  vacuum  state  the  configuration 


<</>>c 


14 


We  can  expand  about  the  vacuum  configuration  as  follows 

(A 

0'  EE  4>-  <  <f>  >0  = 

For  small  variation  in  the  field  the  Lagrangian  becomes. 

£  =  \[(d,vWv)  +  2/iV]  +  \[(dMd»C)}-  (2-33) 

There  are  two  particles  in  the  spectrum,  r\  and  (.  The  77-particle,  associated  with 
radial  oscillations,  has  mass  m2  =  — 2/i2  >  0  while  the  £-particle  (referred  to  as 
Goldstone  boson)  is  massless.  In  general,  one  massless  spin  zero  particle  will  occur 
for  each  broken  generator  of  the  original  symmetry  group.  As  a  general  rule,  in 
any  field  theory  that  obeys  locality,  Lorentz  invariance,  and  positive  definite  norm 
on  the  Hilbert  space,  if  an  exact  symmetry  of  the  Lagrangian  is  not  a  symmetry 
of  the  physical  vacuum,  then  the  theory  must  contain  massless  spin-zero  particles 
whose  quantum  numbers  are  those  of  the  broken  group  generators.  These  massless 
scalars,  which  are  not  observed  in  nature,  will  disappear  altogether  in  the  process 
referred  to  as  the  Higgs  Mechanism  and  the  gauge  fields  obtain  mass.  In  the  next 
section,  we  look  at  one  example  of  an  abelian  gauge  acquiring  mass  via  the  Higgs 
Mechanism. 

2.4    Higgs  Mechanism 

We  consider  an  Abelian  Higgs  model,  which  describes,  in  the  absence  of  sponta- 
neous symmetry  breaking,  the  electrodynamics  of  charged  scalars.  The  Lagrangian, 
which  is  [/(l)-invariant  is  given  by 

£  =  |P"0|2  -  „2|0|2  -  |A|(<f0)2  -  \F^\  (2.34) 


15 


where  (f>  is  a  complex  scalar  field,  is  the  covariant  derivative,  and  is  the 
gauge  field  tensor.  The  Lagrangian  is  invariant  under  U(l)  rotations 

and  under  the  local  gauge  transformations 

<f>(x)    ->   4>'{x)  =  eiqa{x)4>{x)  (2.35) 
A^x)   ->  =  A„(a?)  -  ^a(x).  (2.36) 

Two  cases  can  be  considered  here.  For  n2  >  0,  the  potential  has  a  unique  minimum 
at  (j>  =  0  and  the  exact  symmetry  of  the  Lagrangian  is  preserved.  The  spectrum  is 
simply  that  of  ordinary  QED  of  charged  scalars  with  a  single  massless  photon 
and  two  scalar  particles. 

The  case  of  /j2  =  —  |/i2|  <  0,  however,  corresponds  to  that  of  spontaneously 
broken  symmetry  and  requires  a  closer  analysis.  The  potential  has  a  continuum  of 
absolute  minima,  corresponding  to  a  continuum  of  degenerate  vacua,  at 

<  |0|2  >0=  -/i2/2|A|  =  v2/2.  (2.37) 

We  now  shift  the  fields  in  order  to  rewrite  the  Lagrangian  in  terms  of  displacements 
from  the  physical  vacuum.  The  latter  may  be  chosen,  without  loss  of  generality, 
as 

<  (j>  >0=  v/y/2, 

where  v  >  0  is  a  real  number.  We  then  define  the  shifted  field 

(f>  =  eiVv{v  +  r))/y/2,  (2.38) 
and  the  Lagrangian  for  small  oscillations  becomes 

-\F^V  +  qvA,(d\)  +  ^A^  +  ...  (2.39) 


16 


As  we  expect  from  our  study  of  the  Goldstone  phenomenon,  the  77-field  which 
corresponds  to  radial  oscillations,  has  a  mass  m2  =  —2/j?  >  0.  The  gauge  field 
acquires  a  mass  but  it  is  mixed  up  with  the  massless  (-field.  Note  that  the  terms 
involving      and  (  can  be  written  as 

which  suggests  the  gauge  transformation 

A„  ->  ^  =  4,  +  -^C-  (2-41) 

This  gauge  transformation  corresponds  to  the  phase  rotation  on  the  scalar  field 

=  e-l«x)lv4>{x)  =  (v  +  rj)/y/2.  (2.42) 

Since,  the  original  Lagrangian  is  locally  gauge-invariant,  we  may  return  to  the 
original  expression  and  rewrite  it  as 

£>  =  \[{dM^)  +         -  j  V  +  ^y^A*.  (2.43) 
We  can  make  the  following  observations  about  the  above  Lagrangian. 

•  First,  an  77  field  with  mass  m2  =  -2/j2  >  0  is  introduced  which  is  called  the 
Higgs  boson. 

•  Second,  the  vector  field  A'  has  acquired  a  mass  m  =  qv 

•  Third,  the  unwanted  (-field  disappears  by  giving  the  gauge  field  a  longitudi- 
nal degree  of  freedom  in  the  form  of  gauge  mass  term. 

Therefore,  in  the  context  of  the  Higgs  Mechanism,  the  massless  gauge  field 
absorbs  the  massless  Goldstone  boson  to  become  a  massive  vector  boson.  This 
particular  gauge  transformation  is  referred  to  as  a  unitary  gauge,  because  only 


17 


physical  states  appear  in  the  Lagrangian.  These  results  suggest  that  it  is  possible 
to  construct  spontaneously  broken  gauge  theories  in  which  the  interactions  are  me- 
diated by  massive  vector  bosons,  rather  than  the  phenomenologically  unacceptable 
massless  vector  bosons  of  the  unbroken  theories. 


2.5    Electroweak  Theory  of  the  Standard  Model 

To  begin  with,  we  just  consider  the  electron  and  the  neutrino.  For  each  of 
these  particles  there  is  a  corresponding  Dirac  field  operator. Experimentally,  it  was 
known  that  in  /?-decay  only  the  left-handed  part  veL  couples.  The  Dirac  field 
operator  for  the  electron  can  similarly  be  split  into  a  left-handed  and  right-handed 
part  by  setting 

ii>(x)=ipL(x)  +  Mx),  (2.44) 

where  for  lepton  field: 

eL(x)  =  i(l-75)c(a:)  (2-45) 
eR(x)   =   ^(l  +  j5)e(x)  (2.46) 

Due  to  the  electron  mass  term,  eL  and  eR  are  not  solutions  to  the  Dirac  equa- 
tion and  a  strict  connection  between  helicity  and  the  eigenvalue  of  75  does  not 
hold.  We  form  a  doublet  out  of  the  left  handed  electron  and  neutrino  and  form 
the  following  Lagrangian  density  with  electromagnetic  field  switched  off  at  first. 

/ 


£i(x)  =  (veL(x),eL(x))(irdll) 


+  eR(x)^^eR(x).  (2.47) 


This  expression  exhibits  symmetry  between  the  left-handed  neutrino  and  electron 
fields  while  the  right-handed  electron  is  treated  separately.  In  mathematical  lan- 
guage, we  say  that  the  Lagrangian  is  invariant  under  SU(2)  transformations  in 


18 


the  space  of  the  left-handed  electron  neutrino  pair  while  the  right-handed  electron 
is  singlet  under  SU(2)  transformations.  Let's  denote  the  left-handed  doublet  as 
follows 


L(x) 


(2.48) 


V   eL(x)  J 

Then  we  can  say  that  for  a  global  x-independent  transformation  U  on  the  lepton 
doublet 

L(x)->UL(x),  (2.49) 

the  Lagrangian  is  invariant 

Ci(x)  ->  Ct(x).  (2.50) 

Under  the  gauge  theory  of  the  standard  model  this  global  invariance  is  extended 
to  local  transformations.  That  is,  we  postulate  invariance  of  the  Lagrangian  under 
local  SU (2)  transformations 

L(x)  -»  U(x)L(x),  (2.51) 

where  U(x)  E  SU(2)  for  all  x.  This  transformation,  however,  is  not  a  symmetry  of 
the  Lagrangian  due  to  the  derivative  term.  To  rescue  the  invariance  a  vector  field 
is  introduced  which,  combined  with  the  derivative,  forms  a  covariant  derivative 
leaving  the  overall  Lagrangian  invariant  under  the  local  SU(2)  transformation. 
We  need  the  same  number  of  real  vector  fields  as  there  are  generators  in  the 
invariance  group.  In  the  case  of  SU(2)  we  must  introduce  three  vector  fields.  For 
the  fundamental  representation  of  the  SU(2)  group  we  choose  the  generators  to  be 
the  Pauli  matrices  n,T2,  and  r3.  The  corresponding  vector  fields  will  be  denoted 
as  W^,  W2^  and  and  we  combine  them  into  a  Hermitian  2x2  matrix  with 
zero  trace: 

W,{x)  =  Wl(x)^f.  (2.52) 


19 


Then  the  field  strength  matrix  can  be  written  as 

W^x)   =   dvWvW-dvWM  +  igiW^Wvix)]  (2.53) 


=    W%^,  (2.54) 
where 

W%{x)  =  dpWXx)  -  dvWl(x)  -  geabcWl(x)Wl(x).  (2.55) 

Here  eat,c  are  the  structure  constants  of  the  SU(2)  group  and  g  is  a  gauge  coupling 
constant.  The  leptonic  Lagrangian  can,  now,  be  written  as 

C(x)    =   L{x)iY  (d„  +  igWj  L(x)  +  enix^d^enix) 

+  1-Tr(W^x)W^(x)).  (2.56) 

This  Lagrangian  density  is  invariant  under  SU (2)  gauge  transformations.  That  is 

^(i)  ->  U{x)WJA\x)  -  l-U{x)dlM\x)  (2.57) 

L  ->  U{x)L{x)  (2.58) 

efi(x)  ->  efl(x),  (2.59) 

£(z)  ->  £(z),  (2.60) 

where  U{x)  €  SU(2).  The  gauge  group  that  we  have  just  introduced  is  referred 
to  as  the  weak  isospin  group.  The  field  L(x)  forms  a  weak  doublet,  whereas  eR 
represents  a  singlet. 

Let  us  now  study  the  electro- weak  coupling  in  more  detail  which  was  originated 
from  the  requirement  of  local  SU (2)  invariance.  Let's  define 

Wt  =  ^  (WJ  =F  iWl)  .  (2.61) 

The  electro-weak  coupling  sector  of  the  above  Lagrangian  becomes 

CLW   =   -gfrfW^L  (2.62) 


20 


=  -g(PeL,eL)- 


) 

'  w\  V2\v; 
KV2W;  -wl 

=   ~^(Wl  PeL-fveL  ~  eLl»eL)  +  JlWpeLl»eL 

+V2W;eLj^eL^.  (2.63) 

A  close  look  at  the  above  equation  shows  that  SU (2)  gauge  principle  leads  to  cur- 
rent terms  with  structures  of  the  form  7^(1  —  75).  Hence,  this  leads  to  the  (V  —  A) 
theory  of  weak  interactions  in  a  natural  way.  Unfortunately,  in  the  Lagrangian  den- 
sity there  is  nothing  that  could  generate  mass  for  the  VF-boson.  Electromagnetism 
is  also  missing.  It  is  known  that  the  VF-boson  couples  to  the  left-handed  neutrino 
veL  and  to  the  left-handed  electron  ei,  but  not  to  the  right-handed  electron  eR. 
Thus  it  is  not  possible  to  identify  W^  with  photon  field.  Indeed,  the  photon  does 
not  couple  to  the  neutrino  but  to  the  left-  and  right-handed  electrons.  To  ex- 
tend the  theory  to  include  electromagnetism  we  go  back  to  the  Lagrangian  density 
again.  Until  now  we  have  only  considered  invariance  under  SU(2)  transforma- 
tions. However,  C(x)  is  also  invariant  under  phase  transformations  corresponding 
to  U(l)  groups.  We  now  try  to  extend  this  global  gauge  invariance  to  local  gauge 
by  considering  U(l)  transformations  of  the  form. 

L{x)   ->  e+iyLXL(x) 


(2.64) 
(2.65) 


where  %)l  and  yR  are  fixed  numbers  whose  values  become  known  later.  The  operator 
generating  this  group  will  be  referred  to  as  "weak  hypercharge" .  We  can  form  a 
spinor  tp(x)  out  of  these  fermions  with  the  U(l)  transformations  as  follows. 


ip(x) 


(      (  \  \ 

eL(x) 
eR(x) 


(      (  \  \ 


eR(x) 


=  elxY.iP(x), 


(2.66) 


21 


where 


(  Vl    0  0 


0    yL    0      •  (2.67) 

v  o  o  y«  y 

As  in  electrodynamics,  we  introduce  a  real  vector  field  and  an  accompanying 
gauge  coupling  constant  g'.  The  field  strength  tensor  for  this  Abelian  gauge  field 
is  exactly  as  that  in  electrodynamics 


Bfiu  —  d^.Bv  —  d^B^. 


(2.68) 


The  Lagrangian  density  including  the  SU(2)  gauge  fields  and  the  U(l)  gauge  field 
takes  the  form  of 


(2.69) 


where 


D^(x)  =  (d,  +  igWl(x)Ta  +  ig'B^x)Y)  ^(x), 


(2.70) 


is  the  covariant  derivative.  The  matrices  Ta(a  =  1,2, 3)  are  3x3  matrices  of  the 
form 


V 


\ra  0 
0  0 


\ 


The  Ta  matrices  together  with  the  hypercharge  matrix  Y  form  a  representation 
of  the  generators  of  the  group  SU(2)  x  U(l)  since  they  obey  the  following  commu- 
tation rules. 


[Ta,  Tb]  —  ieabcTc, 
[Ta,Y]  =  0. 


(2.71) 
(2.72) 


22 


The  Lagrangian  sector  involving  the  coupling  of  the  fermion  fields  to  gauge 
fields,  referred  to  as  the  Weyl-Dirac  Lagrangian,  Cwd  can  now  be  expanded  as 
follows. 

Cwd   =   -^(gWlTa  +  g'B^Y)^ 

=   -4=  {WpeLl*eL  +  W;eL^ueL) 

-\(gWl  +  2yLg,Bl)veLrveL 

+\  (gWl  -  2yLg'Bl)  eL^eL  -  yRg'B^eR^eR.  (2.73) 

We  now  attempt  to  arrange  the  terms  containing  W3^  and  B^  in  such  a  way  that 
the  electromagnetic  coupling  can  be  identified.  Note  that  one  of  the  constants  y^ 
or  yR  can  be  freely  chosen  since  we  also  have  g'  as  free  parameter.  We  choose 

Vl  =  ~\  (2.74) 

At  the  level  discussed  so  far,  the  two  vector  fields  and  B^  are  on  an  equal 
footing  as  they  are  both  neutral  and  both  massless.  Any  two  orthogonal  linear 
combinations  of  these  fields  form  an  equivalent  basis.  We  define  the  following 
combination  denoted  by  Z ^ 

Z,  =  {9Wl  -  g'B,)  .  (2.75) 

Since  the  neutrino  has  no  electric  charge,  cannot  contain  any  component  of  the 
photon  field,  A^.  Candidate  for  the  later  is  given  by  the  linear  combination  that 
is  orthogonal  to  Z^ 

A,  =  7=^p(gWl  +  g'Btl).  (2.76) 


Defining 


q' 

sm6w    =         \  (2.77) 
V9  +  9 

COS6w   =   TFTg^  (2'78) 


23 


we  can  re- write  2.75  and  2.76  as 

Z^   =   co&OwWl  -  sm9wBfi  (2.79) 
=   sm6wWl  +  cos9wBfl.  (2.80) 

The  angle  6w  is  known  as  the  weak  angle.  If  we  replace       and        in  £'  by 
and       then  we  obtain: 

Cwd   =   -^(wpeL^eL  +  W;^ueL) 

-  sin  6W  {-eL^eL  +  yReRYeR) 

-  r^—^A,  (-eL^eL  +  yReR-feR) .  (2.81) 
V92  +  9 

We  can,  thus,  attain  the  correct  form  of  the  electromagnetic  coupling  if  we  set 

99' 

Using  the  definition  of  the  weak  angles  in  2.77  we  can  write 

e  =  g  sin  0w  =  g'  cos  8w  (2.82) 

The  Lagrangian  density  Cwd  can  be  written  in  a  more  compact  form  as  follows 

CWD   =   -e(AllJ»M+       1       iWpeL^eL  +  W-eLl^ei) 
\  \J2smvw  v  p  ' 

Z^.Jnc\  (2.83) 


sin  #wcos  Qw 
where 


Jem  =  -eL^eL-eRl»eR  =  -ere,  (2.84) 
Jnc   =   \veL-fveL-\eL-feL-  sin2  9wj£M.  (2.85) 


24 


The  electromagnetic  current  is  denoted  here  by  j£M  and  the  J^c  denotes  neutral 
current  which  couples  to  the  Z-boson. 

Let  us  discuss  these  results  in  more  detail.  By  gauging  the  group  SU(2)  x 
U(l)  we  have  attained  a  coupling  structure  that  can  describe  the  electromagnetic 
interaction  and  the  charged  current  processes  of  the  weak  interactions.  However, 
we  have  also  attained  an  additional  neutral  boson  Z.  This  is  not  unwelcome, 
because  since  1973,  we  know  of  the  existence  of  the  neutral  weak  current.  The 
necessity  for  the  Z-boson  stemmed  from  the  requirement  that  the  charged  weak 
current  with  its  (V  —  A)  structure  and  the  electromagnetic  current  with  its  pure 
vector  structuree  should  be  brought  into  a  unified  system.  To  do  so,  it  is  necessary 
to  choose  a  suitable  value  for  yR.  The  neutral  field  and  the  photon  field 
are  linear  combinations  of  the  and  the  B^.  The  fields  and  B^  are  in  turn 
the  third  isospin  partners  of  the  charged  fields  and  the  gauge  field  for  the 
hypercharge  transformation.  The  weak  mixing  angle  9W  remains  a  free  parameter 
in  this  theory,  whose  value  can  only  be  determined  by  experiment. 

Our  theory  is  not  realistic  in  this  form,  because  we  have  not  given  masses  to 
the  leptons  and  to  the  bosons  W±  and  Z.  The  masses  of  the  bosons  W±  and  Z 
must  be  very  large  so  that  we  obtain  an  almost  point-like  four-fermion  coupling  at 
low  energies.  One  could  simply  try  to  add  explicit  mass  terms  to  the  Lagrangian 
density.  However,  this  violates  gauge  invariance  and,  as  it  turns  out,  leads  to  a 
theory  which  is  not  renormalizable.  A  possible  way  out  of  these  difficulties  was 
presented  in  the  work  of  Weinberg  and  Salam  in  which  they  used  the  phenomenon 
of  spontaneous  symmetry  breaking. 


25 


2.6    The  Higgs  Field  and  the  Extended  Lagrangian 


To  introduce  the  masses  without  violating  the  gauge  invariance  and  renormal- 
isability  we  follow  the  discussion  presented  in  sections  2.3  and  2.4.  That  is,  we 
introduce  additional  scalar  fields,  the  so-called  Higgs  fields,  to  the  Lagrangian.  In 
its  minimal  form,  it  is  sufficient  to  introduce  two  complex  scalar  fields  fa  and  fa 
and  demand  that  these  fields  form  a  doublet  under  the  transformations  of  the  weak 
isospin  group. 

I  dh(x\  \ 

(2.86) 


fax) 

The  corresponding  Lagrangian  is 


£4>  =  (V»<t>)1(Vlt<t>)-V(<i>), 


(2.87) 


where      is  covariant  derivative  and  V  is 


y(0)  =  /xV0  +  |A|(^V)1 


(2.88) 


As  before,  we  require  that  A  >  0  for  the  theory  to  be  stable.  We  also  require 
that  //2  <  0  so  that  it  leads  to  the  spontaneous  symmetry  breaking  and  to  massive 
gauge  fields.  The  minimum  of  the  potential,  and  therefore  of  the  total  energy  is 


<  <f>  >i=  -^/\\\  =  v 


(2.89) 


The  last  expression  gives  a  condition  on  the  "length"  of  the  field  0.  The  orientation 
of  the  ground-state  field  in  the  two-dimensional  isospin  space  is  not  determined. 
Hence,  4>  with  length  v/\/2  can  be  parameterized  as  follows: 

/ 

(2.90) 


_  J(t/2)0 


(f)  =  e 


\  751 


26 


T     T3     Y  Q 

<f>l(x) 

1  I      I  1 

2  2  2 

2           2       2  u 

Table  2.1:  The  quantum  numbers  of  the  Higgs  boson. 


Here  0  is  a  vector  in  isospin  space  with  |0|  <  2tt.  The  ground  state  is  therefore 
infinitely  degenerate.  Each  of  the  individual  equivalent  ground  states,  for  example, 
the  field  configuration 

t  o  ^ 

i 


0  = 


(2.91) 


V  T2V  J 

is  not  invariant  under  SU (2)  transformations.  The  517 (2)  group,  which  is  a  sym- 
metry of  the  Lagrangian  density,  is  spontaneously  broken  by  the  ground  state. 
Next,  we  define  a  new  field  (j)'(x)  by  means  of 

<j>'(x)  =  4>{x)-  <  O|0(x)|O  >,  (2.92) 

whose  vacuum  expectation  value  is  zero: 

<  0|<//(x)|0  >=0.  (2.93) 

It  can  be  shown  that  by  using  the  field  4>'(x),  perturbation  theory  may  be  formu- 
lated in  the  usual  manner. 

We  will  now  allow  the  Higgs  field  to  interact  with  the  gauge  bosons  and  the 
fermions.  The  structure  of  the  coupling  should,  of  course,  respect  the  gauge  in- 
variance  under  the  SU(2)  x  U(l)  group  of  weak  isospin  and  weak  hypercharge.  We 
would  also  like  to  have  a  renormalizable  theory.  The  field  <j>(x)  is  a  weak  isodoublet. 


27 


We  can  thus  form  an  isospin  invariant  coupling  term  as  follows. 


C-Yuk   =   =  -ceeR$L  +  h.c. 

( 


+  H.C. 


(4>\eRi/eL  +  </4efleL)  +  H.C. 


(2.94) 


Here  ce  is  a  coupling  constant.  The  Yukawa  term  is  invariant  under  local  isospin 
transformations.  We  now  demand  that  the  coupling  is  invariant  under  hypercharge 
transformations.  This  yields  the  condition 

1 


Vh  =  Vl-  Vr 


(2.95) 


Adding  the  Yukawa  term  in  the  Lagrangian,  we  obtain  the  following  gauge  invariant 
Lagrangian  density 

£  =  -lTr(W^Wn-jBtluB^  +  (ueL,eL)i^Dll(UeL 


+eRiYDfJ,eR  -  ceeR$ 
+(D^(D^)  -  V(d>). 


( 


c*e{veL,eL)(f)eR 


(2.96) 


The  Lagrangian  density  2.96  is  invariant  under  gauge  transformation  of  the 
SU(2)  x  [/(l)group.  We  can  list  the  explicit  form  of  these  transformations  as 
follows. 

SU(2)  gauge  transformations: 


W^x)  — >  U{x)WtlU\x)  -  -U{x)dtlU\x) 


L(x) 


U(x)L(x), 


(2.97) 


28 


eR(x)  — ►  eR{x), 
<j)(x)  — y  U(x)(j)(x), 

where  U(x)  e  SU(2)  and  we  choose  U(x)  =  ^(W^M  with  0a(x)  (a 
being  arbitrary  real  function  of  x. 
U(l )  gauge  transformations: 


1,2,3) 


W,{x) 
B,(x) 

L(x) 
eR(x) 

<t>(x) 


9 

elVLx(x)L, 

eiy^eR(x), 

eiyHx{x)(f>(x). 


(2.98) 


Here  xix)  1S  an  arbitrary  real  function  of  x. 

Using  the  gauge  transformation  we  can  rotate  the  Higgs  field,  (j>,  into  any 
direction  in  isospin  space.  Therefore,  according  to  2.90  there  is  always  an  SU (2) 
gauge  transformation  which  allows  us  to  write  the  Higgs  field  as  follows 

/       n  \ 


U{x)<f)(x) 


0 


(2.99) 


V  73^)  / 

where  p(x)  >  0.  This  choice  of  gauge  is  called  "unitary  gauge" .  For  small  fluctua- 
tion, the  vacuum  expectation  value  of  the  Higgs  field  is  given  by  the  minimum  of 
the  potential  field 


<0|p(:r)|0>    =  v 


(    0  ^ 


(2.100) 
(2.101) 


<0|(£(:z)|0>  = 

V  T2V  J 

This  vacuum  expectation  value  is  obviously  not  invariant  under  the  full  5/7(2)  x 
U{\)  gauge  group.  Only  the  U(l)  subgroup,  which  is  generated  by  Q  =  T3  +  Y, 


29 


leaves  the  vacuum  expectation  value  of  (j>  invariant.  This  remaining  symmetry,  we 
will  see,  corresponds  to  the  gauge  group  of  electromagnetism. 

To  derive  mass  terms  for  vector  bosons  we  study  the  Higgs  kinetic  field  in  more 
detail.  We  shift  the  field  p(x)  by  an  amount  equal  to  its  vacuum  expectation  value: 


h(x)  =  p(x)  —  v, 


(2.102) 


<  0\h{x)\0  >=  0. 


(2.103) 


To  identify  the  masses  we  look  at  the  relevant  portion  of  the  Lagrangian.  We  can 
write 

<  0|^|0  >  {-igWa^  -  ig'B^  (igWa^  +  ig'B»yH^  <  O|0|O  > 

.      v    /  2gg>A»+(g2-g'2)Ztl  JLW+  ^ 

2vV+9'2  ^2  M 


V 


V2 


yV+g'2 


/  2gg'Atl+{g2-g'2)Zfi  _2_W+ 

yV+g'2  ry 

 2  ZM  / 


4-W" 


0 


V  T2V 


=  9^w  w,+  +(g^g>2  , 

4       f  g  M 


(2.104) 

Hence,  symmetry  breaking  leads  to  the  generation  of  the  vector  boson  masses  with 
the  following  mass  terms. 


,2„,2 


o2„,2 


2       g v  e^ 


mi  = 


M? 


w 


{g2  +  g>2  = 

4  sin2  (V  cos2  9W      cos2  #n/ 


(2.105) 
(2.106) 


We  can  find  the  fermion  mass  parameters  by  studying  the  Yukawa  sector. 


<  O|0|O  > 


+  (Pei,eL)  <  O|0|O  >  eR 


30 


(v  v 


v 


Ce\/2  (eR€L  +  eLeR) 
,,-j=ee,  (2.107) 


which  has  exactly  the  form  of  a  mass  term  for  the  electron  with  me  given  by 

me  =  ce-^=.  (2.108) 

Therefore,  spontaneous  symmetry  breaking  results  into  generating  mass  terms 
for  the  electron  while  the  neutrino  remains  massless.  The  total  Lagrangian  density 
after  making  transition  to  the  field  h(x)  has  the  following  form: 

C  =   -^Tr{W^W^-^B^B^  +  i?eLiYdllueL  +  eirdfle 

+W;W-«M]v  (l  +       +  \z^m\  (l  +  ±J 

-meee  ll  +  -j  +  ^hd^h  (2.109) 


2> 


-e(A^M  +       1       (WpeL^eL  +  W~eL-fueL) 
\  \/2sin6'vK  p  y 

+  •  a  1    o   ZA),  (2-110) 
sin  tvcos  y  v  ' 

where  J"^M  and  J7"jvC  are  electromagnetic  and  neutral  currents  respectively  and  the 
parameter  Mh  is  defined  as 

M2h  =  2\v2.  (2.111) 

Examining  the  Lagrangian  density,  we  can  read  off  the  particle  content  of  the 
theory.  We  have  a  massless  photon  field  three  massive  vector  bosons,  W± 
and  Z;  a  massless  left-handed  neutrino;  a  massive  electron;  and  a  massive  neutral 


31 


boson  with  spin  zero  and  mass  Mh,  the  Higgs  particle.  In  the  above  Lagrangian 
we  identify  the  following  parameters 

e,  smdw,  me,  M^,  M\.  (2.112) 

where  the  mass  parameter  Mf  can  be  expressed  in  terms  of  Mw  and  9w  using 
relation  2.106. 

Our  objective  was  to  show  the  mechanism  by  which  the  fermion  mass  terms  and 
the  Higgs  mass  term  are  introduced  in  the  content  of  the  Standard  Model.  We  can 
see  that  for  the  consistency  of  the  theory  existence  of  a  massive  Higgs  is  a  necessity. 
This  mechanism  also  gives  masses  to  the  leptons.  Extension  of  the  electro-weak 
theory  to  quarks  will  give  masses  to  quarks  in  an  analogous  fashion.  We  will  not 
discuss  this  in  detail,  but  outline  the  steps  which  leads  to  the  introduction  of  the 
massive  quarks  including  top  quark. 

2.7    Extension  of  the  Electroweak  Theory  to  Quarks 

We  can  extend  the  electro-weak  theory  to  include  quark  fermions.  We  proceed, 
as  before,  to  arrange  all  left-handed  quarks  as  weak  isodoublets  and  all  right- 
handed  quarks  as  singlets.  The  hypercharges  are  selected  such  that  the  quantity 
T  +  Y  =  Q  is  in  agreement  with  experimentally  observed  charges.  This  scheme 
requires  the  existence  of  the  top  quark,  t.  Although,  the  mass  of  the  top  quark, 
(or  any  quark),  is  not  predicted  by  the  Standard  Model,  it  gives  a  clear  picture 
as  to  how  this  quarks  and  other  fundamental  particles  interact.  Therefore,  discov- 
ery of  the  top  quark  and  studying  its  properties  would  provide  a  confirmation  of 
the  theory.  Of  course,  discovery  of  the  Higgs  boson  would  provide  the  ultimate 
confirmation,  since  the  theory  depends  on  its  existence. 

We  include  quarks  in  the  Lagrangian  by  combining  all  the  fermions  into  a  total 


32 


spinor  4>: 


\  bR 


(2.113) 


The  corresponding  representation  matrices  of  the  SU(2)  x  U(l)  algebra  are  again 
denoted  by  Ta  (with  a  =  1,2,3)  and  Y.  The  covariant  derivative  of  the  Fermi 
fields  is  then  given,  according  to  the  rule  of  minimal  substitution  ,  by 

=  (dM  +  igW;Ta  +  ig'B^Y)  V  (2.114) 

The  Yukawa  term  can  be  extended  to  include  the  coupling  of  the  Higgs  field  to  all 
of  the  fermions.  The  most  general  expression  for  this  coupling  that  is  compatible 
with  fermion  quantum  numbers  is 

£>Yuk{x)  =  i>(x)(j)l(x)CliP(x)  +  h.c  (2.115) 

where  C{  (with  i=l,2)  are  any  complex  matrices  acting  on  the  flavor  index  of  ip. 
We  will  explore  the  Yukawa  term  and  the  matrices  C  in  more  detail.  We  can,  now, 
write  the  Lagrangian  density. 


+CYuk  +  {D^YiD^)  -  V(</>). 


(2.116) 


In  terms  of  h  field,  the  Lagrangian  becomes 

1„  _„   1 

2 

+W+W-flM^  (1  + 


--7Y  (W^wn  -  -B^B^  +  Ji-fD^  +  CYuk 


£)2+izW1+£V 


+±dtlhd»h-1-M2hh2 


\+h- 

(91 

V 

(2.117) 


33 


The  coupling  of  the  fermions  to  the  gauge  bosons  is  given  by  the  covariant 
derivative  of  the  fermions  (2.114).  The  interaction  portion  of  the  Lagrangian  is 

Ant  =  (gW^Ta  +  g'B.Y)  4>.  (2.118) 

Expressed  in  terms  of  the  physical  boson  fields  W^,  ZM,  and       this  gives 

-  ^  a    {W;j£c  +  W-J&)  ,  (2.119) 

where 

Jem   =   frf{T3  +  Y)1>,  (2.120) 

J£c  =  h"(Ti+iT2)iP  (2.121) 
J^c   =   4>r(T3-sin2ew(T3  +  Y))iP 

=  ^fT3x/}  -  sin2  0wJ£M.  (2.122) 

Here  Jem-,  Jcc,  and  JNc  are  electromagnetic  current,  charged  current,  and  neutral 
current  respectively.  This  current  terms  define  electroweak  interactions  (interac- 
tion vertices).  The  electric  charge  matrix,  Q  is  as  before 

Q  =  T3  +  Y. 

Hence,  the  gauge  group  of  electromagnetism  is  exactly  the  U(l)  gauge  group  that 
leaves  the  vacuum  expectation  value  of  <f>  invariant,  and  therefore  remains  unbro- 
ken. The  hypercharge  values  of  the  fermions  in  Table  2.2  are  chosen  such  that 
the  electromagnetic  current  is  a  pure  vector  current  and  yields  the  experimentally 
observed  charges. 

2.8    Quark  Masses  and  Mixing 


In  analogy  to  leptons,  quark  masses  are  generated  by  Yukawa  couplings  to  the 
Higgs  scalars.  The  fundamental  weak  eigenstates  of  the  unbroken  gauge  theory 


34 


can  be  written  as 


QjL  = 


\  d3 


,  ujR  ,  djR  ,  (j  =  1,2,3), 


where  QjL  is  an  SU{2)  doublet  with  Y  =  |  and  UjR  ,  djR  are  SU(2)  singlets  with 
Y  =  |  ,  - 1  respectively.  The  kinetic  part  of  the  new  Lagrangian  due  to  the  quarks 
would  be  analogous  to  that  of  the  leptons.  The  Yukawa  term  which  leads  to  the 
production  of  the  quark  masses  requires  further  elaboration.  To  generate  quark 
masses  we  need  not  only  the  doublet  <f>  with  Y  =  |,  but  also  conjugate  multiplet 


(j)  —  lT2(j)* 


\ 


V  -<h  j 

which  transforms  as  a  doublet  with  Y  =  -\.  The  most  general  SU{2)  x  [7(1) 
invariant  Yukawa  interaction  involving  jus  t  the  quarks  has  the  form  of 

3 

i-Yuk(Q)  =  sumi=l 

E  [CijUiR  $Qii)  +  CijdiR  (tfQjL)]  +  h.c.  (2.123) 

j=l 

where  we  have  allowed  inter-generation  couplings.  This  interaction  depends  on 
18  different  complex  coupling  Cij,  Cij.  From  the  explicit  form  of  the  <f>  and  <fi,  we 
obtain  mass  terms  for  the  charge  |  and  charge  —  |  quarks, 


(ui,u2,u3)RMu 


(dud2,d3)RM° 


(  \ 

Ui 
1 1 2 

d\ 
d> 
Kd3  / 


+  h.c. 


+  h.c. 


where  =  ^Gij  and  Mfj  =  ^G{j  are  quark  mass  matrices  in  generation 
space,  each  depending  on  9  complex  parameters.  These  matrices,  are  in  general, 


35 


T 
1 

T 

v 

1 

D 

1 
2 

1 

2 

1 

2 

n 

u 

eL 

»L 

1 
2 

1 

2 

1 

2 

_1 

eR 

tr 

n 
u 

n 
1 1 

1 

—  1 

1 

—  1 

uL 

cl 

tl 

i 

2 

i 

2 

1 

6 

2 
3 

dL 

sl 

1 

2 

1 
2 

1 

6 

1 

3 

Ur 

Cr 

tR 

0 

0 

2 
.3 

2 
3 

dR 

sr 

bR 

0 

0 

1 

3 

1 

3 

Table  2.2:  The  flavor  quantum  numbers  of  the  leptons  and  quarks. 


not  hermitian.  Any  unitary  transformation  on  the  quark  fields  will  preserve  their 
anticommutation  relations.  Moreover,  any  complex  matrix  can  be  transformed  to 
a  diagonal  matrix  by  multiplying  it  on  the  left  and  right  by  appropriate  unitary 
matrices.  Thus,  by  unitary  transformation  on  the  fundamental  quark  states  of  the 
unbroken  electroweak  theory, 


/  \ 


L,R 


L,R 


(  \ 

U 


L,R 


do 


\   "3  / 

V        /  L,R 


DL,R 


L,R 


we  can  transform  Mu  and  Md  to  diagonal  forms 


/ 


U^MUUL 


mu  0  0 
0  mc  0 
0      0  mt 


36 


(  md     0  0 


D~RlMdDL  = 


I 


0    ms  0 
0      0  mb 

where  Ur,Ul,  Dr  and  Dl  are  unitary  matrices  and  the  diagonal  entries  are  the 
quark  masses.  The  weak  eigenstates  ui,u2,  u3  are  linear  superpositions  of  the  mass 
eigenstates  u,  c,  t  and  likewise  d\,  d2,  d3  are  superpositions  of  d,  s,  b.  Recall  the  form 
of  the  kinetic  term  in  the  Lagrangian  ,  eq  2.69  and  eq  2.70.  Therefore,  we  should 
have  bilinear  terms  uiLl^diL,  ^Ll^diu  W3l7^^3L  whose  sum  can  be  represented  as 
an  inner  product  of  vectors  in  generation  space. 


(ui,ti2,u3)L7mu 


d2 
Kd*  ) 


(u,  c,t)LUlDL^ 


L  \      /  L 

Therefore,  there  will  generally  be  generation  mixing  of  the  mass  eigenstates,  de- 
scribed by  the  matrix 

v  =  u[dl. 

Again,  from  the  form  of  equations  2.69  and  2.70  we  can  identify  the  following 
bilinear  for  the  quarks 


(ui,u2,u3)L 


(u,c,t)RUlUL 


(  \ 

u 


V'/ 


+  h.c. 


but  since  U\lJL  =  1,  there  is  no  mixing  in  this  case.  The  uR,dL,  dR  bilinear  current 
terms  are  similarly  unmixed. 

Like  the  mass  matrix  itself,  the  mixing  of  quark  flavors  in  the  charged-current 
weak  interaction  has  no  fundamental  explanation  here,  though  theoretical  attempts 
to  predict  the  mixing  angles  have  been  made.  If  all  quark  masses  were  zero  (or 


37 


equal),  weak  mixing  phenomena  would  not  exit.  In  terms  of  the  above  general 
mixing  matrix  V,  the  charged  weak  currents  for  quarks  are 


Jin  =  {u,c,t)R^V 


(2.124) 


By  convention,  the  mixing  effect  is  only  given  to  the  T3  =  —  \  states  by  defining 


=  V 


(2.125) 


Then  the  quark  weak  eigenstates  are 


fc^ 

'  *  \ 

<*) 

L 

L 

(2.126) 


The  unitary  3x3  matrix  V  can  be  specified  by  9  independent  parameters;  the 
18  complex  parameters  of  a  general  3x3  matrix  are  reduced  to  9  by  the  unitarity 
requirement  V^V^  =  5^.  However,  we  have  the  freedom  to  absorb  a  phase  into 
each  left-handed  field, 

qL  ->  eia^qL, 

which  removes  an  arbitrary  phase  from  each  row  or  column  of  V ,  reducing  the 
degrees  of  freedom.  But  since  V  is  unchanged  by  a  common  phase  transformation 
of  all  the  qL,  only  6  —  1  =  5  phase  degrees  of  freedom  can  be  removed  in  this 
way.  Therefore,  V  can  be  expressed  in  terms  of  only  9  —  5  =  4  physically  indepen- 
dent parameters.  The  mixing  matrix  can  be  parameterized  by  a  product  of  three 
rotation  matrices  R  and  a  phase  insertion  matrix  D  as 


V  =  R2{-d2)Rl{-0l)D(8-Tt)r2{pz), 


(2.127) 


38 


where 


Ri(8i)  = 


a    Si  o 

-Si  Ci  0 
0     0  1 


'  1   0  0 


1     0  0 

0  Ci  Si 
0   -Si  a 


D(6)   =       0   10,  (2.128) 
^  0   0  elS 

and  Cj  =  cos  9i,  Si  =  sin0j.  this  construction  leads  to  the  Cabibbo-Kobayashi- 
Maskawa  (CKM)  form 


V  = 


SiCz 


-S\SZ 


\ 


sic2  Cic2c3  -  s2s3elS  cic2s3  +  s2c3elS 
sxs2   cis2c3  +  c2sj,el6   cxs2Sz  -  c2czelS 


(2.129) 


By  suitable  choice  of  the  signs  of  the  quark  fields,  we  can  restrict  the  angles  to  the 
ranges 

0  <  0i  <  tt/2  ,      0  <  S  <  2tt 

The  phase  6  gives  rise  to  CP-violating  effects.  An  alternative  form  of  the  CKM 
matrix  sometimes  used  in  the  literature  replaces  S  above  by  Sir  and  6\  by  —  6\. 
The  CKM  parameters  are  not  predicted  by  the  standard  model.  In  the  limit 
62  —  #3  =  0,  S  —  7r,  V  reduces  to  the  Cabibbo  rotation 

C\  —S\  0 
s\  c\  0 
0      0  1 


/ 


v  =  R1(-el)  = 


\ 


V 


(2.130) 


which  mixes  the  first  and  second  generations  only. 

We  now  apply  the  preceding  results  to  reduce  the  Yukawa  coupling  to  a  canon- 
ical form.  In  the  unitary  gauge  the  Yukawa  sector  becomes 

£>Yuk{Q)  = 


39 


ce    0  0 


\ 


0  C/1  0 
0    0  cT 


e 


(u,c,  <)R 


cu    0  0 
0    cc  0 
^  0    0    ct  y 


u 
c 


^  Q    0    0  N 


0    cs  0 
\  0    0    cb  J 

It  is  easy  to  identify  the  mass  terms 


vt 


+  h.c. 


v 

71 


i  + 


(2.131) 


^,...:mb-cb-^. 

Because  of  the  CKM  matrix,  V,  the  quark  fields  d',  s',  and  b'  do  not  correspond 
to  particles  of  well  defined  mass.  The  fields  d',  s',  and  b'  are  defined  as  isospin 
partners  of  the  fields  u,  c,  and  t.  The  fields  d,  s,  and  6,  which  do  correspond  to 
quarks  of  well-defined  mass  are  given  by 


(2.132) 


y 


(2.133) 


We  now  investigate  the  physical  consequence  of  the  generalized  Cabibbo  rotation. 
Since  the  quark  charged  current,  Jcc,  contains  the  isospin  raising  operator,  (7\  + 
zT2),  the  fields  u,  c,  and  t  are  connected  with  their  isospin  partners  d',  s',  and  b' 


JCC     =  (Ve,Vn,Vr)LY 


=  (Pe,j/M,PT)L7'1 


e 


i' 

\ T  j 


+  (u,c,«)Ly 


(2.134) 


40 


Thus,  the  presence  of  the  matrix  V  in  the  charged  current  allows  flavor  changing 
processes  such  as 


d  ->  u  +  W~  (Vn), 
s  ->  u  +  W"  (V12), 
b  ->   u  +  VK-  (V13). 


(2.135) 


The  amplitude  of  the  process  proportion  of  the  matrix  element  of  V  is  indicated 
in  the  brackets.  The  transition  d  — >  u,  for  example,  is  responsible  for  the  decay  of 
the  neutron. 

Let  us  now  investigate  the  neutral  current  in  more  detail.  On  account  of  the 
unitarity  of  the  CKM  matrix,  V,  we  find  from  2.122 


$NC  —  {vei  Vfi,  vt)-Pl 


+  (e,  n,  t)Y  I  -  ^  +  sin2  9W 


1 


/  \ 

u 


+  (d,S~b)-f  f-i  +  ^sin29w 


where  PL  —  -(1  —  75).  Note  that  there  are  no  flavor  changing  terms  here.  For 
example,  there  is  no  transition  of  the  form 


s  -4  d  +  Z. 


Such  a  transition  could  give  rise  to  reactions  like 


Kl  ~  (sd  +  sd)  — >  //+//~ 


Experimentally,  such  processes  are  not  observed.  Therefore,  The  Standard  Model 
has  successfully  explained  the  flavor  changing  in  the  interactions  involving  charged 


41 


current  and  it  absence  in  the  interactions  involving  neutral  current  in  a  natural 
way.  This  is  an  example  of  why  Standard  Model  has  been  so  successful. 


2.9    Quantum  Chromodynamics,  QCD 

So  far  in  this  chapter,  we  have  only  considered  the  electroweak  interaction 
and  have  ignored  the  strong  force.  We  can,  now,  include  the  strong  force  in  the 
standard  model.  We  start  by  combining  the  fundamental  fermions  shown  in  Table 
2.2  into  a  total  spinor,  Along  with  the  SU(2)  x  U(l)  group  of  the  electroweak 
interaction,  we  also  consider  the  color-St/ (3)  group  of  the  strong  interactions.  With 
respect  to  this  SU (3)  group,  all  leptons  are  singlets  since  they  do  not  participate 
in  the  strong  interactions  and  all  quarks  are  triplets.  The  space  of  the  spinors 
ip,  thus,  also  carries  a  nontrivial  representation  of  the  co\or-SU(3)  group.  The 
generating  operators  of  this  highly  reducible  representation  will  be  denoted  by 
Fa(a  =  1,...,8).  When  Fa  is  applied  to  the  spinor  xp,  it  multiplies  all  lepton 
fields  by  zero  and  all  quark  fields  by  Aa/2: 


'**\  (  o  \ 


F 


(2.136) 


\bR  J      \%bL  J 

The  generators  of  the  color-S77(3)  group  obviously  commute  with  the  generators 
Ta(a  =  1,2,3)  and  Y  of  the  SU(2)  x  U(l)  group  of  the  electroweak  interaction. 
Together  Fa,Ta,  and  Y  generate  the  group 


Q  =  SU{3)  x  SU{2)  x  U(l), 


(2.137) 


42 


which  is  the  direct  product  of  the  color  group  SU{3),  the  group  SU{2)  of  the 
weak  isospin,  and  the  group  U(l)  of  the  weak  hypercharge.  This  group  Q  is  the 
fundamental  group  that  is  gauged  in  the  standard  model,  whereby  the  strong  and 
electroweak  forces  are  generated.  Including  the  new  gauge  field  in  the  Lagrangian 
density,  we  have 

C   =  -\Tr{G,uGn-\Tr{W^Wn-\B,uB^ 

+^D^  +  (D^  (D»<t>)  +  CYuk  +  V(<(>).  (2.138) 

Here  G^  is  the  gluon  field-strength  tensor  given  by 

=  d^Gvix)  -  dvG»{x)  +  igs  [G^x),G„{x)} ,  (2.139) 


where 


G^x)   =  Ga„v(x)^,  (2.140) 

G,(x)  =  Gl(x)^  =  Gl(x),  (2.141) 
TrG^x)   =   0.  (2.142) 


and 


G%,(x)  =  dvGXx)  -  dvGl(x)  -  gsfabcGl(x)Gl{x)  (2.143) 

with  (a  =  1, . . . ,  8).  The  Aa  are  the  Gell-Mann  A-matrices  which  are  the  generators 
of  the  SU(3)  group  and  fabc  are  the  corresponding  structure  constants. 
The  extended  covariant  derivative  is: 

=  (dv  +  ig8GlFa  +  igWaTa  +  ig'BJT)  ip. 

We  regard  the  Higgs  field  0  as  a  color  singlet.  Thus  the  Higgs  part  of  the  La- 
grangian and  the  Yukawa  sector  ,Cyuk,  remain  the  same.  The  mechanism  of  the 
spontaneous  symmetry  breaking  of  the  SU(2)  x  U(l)  part  of  the  gauge  group  Q  is 


43 


exactly  the  same  as  before  while  the  SU (3)  part  of  the  group  Q  remains  unbroken. 
In  terms  of  the  physical  fields  the  Lagrange  density  takes  the  explicit  form: 


£  = 


-l-Tr  (G^Gn  ~  \tt  (W^wn  -  \b„vB^ 
+WtW-»M*w  (l  +  lj  +  \z,Z'*t  (l  +  *y 

+  E 


ft' 


a 


tTM^  +  i&G^    -ro,   1  + 


+  1-dtihd»h  -  l-m\h2 


h 

1  +  -  + 

en 

V 

V-^EM 
C 


sin  t/w  cos  c^h/ 


(2.144) 


\/2sin  0^ 

The  indices  £  =  e,  //,  r  and  q  =  u,d,  s,  c,  6,  t,  are  to  be  summed  over.  The  current 
terms  are 


Jem 
Jnc 


i>Y  (T3  +  Y)iP 

i>Y  (T3-  sin2  0w(T3  +  Y))^ 

^7"  (T^iTa)^ 


(2.145) 
(2.146) 
(2.147) 


In  particular  the  electromagnetic  current  has  the  following  explicit  form. 

Jem  =  E  ~h»L  +  E  Q,rfq,  (2-148) 

l  Q 

where  Qq  are  the  quark  charges. 

The  gauge  symmetry  that  is  manifest  in  the  above  Lagrangian  corresponds  to 
the  unbroken  part  SU(3)  x  Uem(l)  of  the  full  gauge  group  Q.  This  can  be  shown 


44 


schematically  as  follows: 

G  =  SU{3)  x  SU{2)  x  17(1)  ->  SU(3)  x  Uem(l),  (2.149) 

where  the  arrow  indicates  spontaneous  symmetry  breaking. 

Having  reviewed  the  Standard  Model,  let's  recall  our  objectives.  Our  goal  was 
to  investigate  methods  for  detecting  the  Top  quark  and  the  Higgs  boson.  It  is 
because  of  the  great  success  of  the  Standard  Model  so  far  that  we  are  optimistic  to 
be  able  to  find  the  Higgs  boson.  The  discovery  of  the  Top  Quark  has  recently  been 
announced  and  our  technique  discussed  in  chapter  5  can  be  used  to  further  enhance 
its  signal.  Except  for  the  Higgs  boson,  all  of  the  Standard  Model  parameters  have 
been  experimentally  observed.  For  reference,  we  list  here  the  Standard  Model 
parameters  that  we  have  seen  so  far. 

3  coupling  constants      gs,  e,  sin  9W 

2  boson  masses  Mw, 

3  lepton  masses  me,mM,mr 

6  quark  masses      mu,  m^,  mc,  ms,  mt,  nrib 
4  CKM  parameters      9U92,93,5.  (2.150) 

The  mass  of  the  Z  boson  can  be  expressed  in  terms  of  Mw  and  9W  via  equation 
2.106.  Therefore,  there  are  a  total  of  18  parameters,  a  figure  which  does  not  include 
the  neutrino  masses  which  are  assumed  to  be  zero.  Such  a  number  of  independent 
parameters  would  seem  to  be  rather  high  for  a  truly  fundamental  theory.  If  the 
neutrino  is  found  to  have  mass,  then  this  number  increases  even  further  because 
in  this  case  one  needs  to  consider  a  similar  CKM  matrix  in  the  lepton  sector. 


45 


2.10    Properties  of  the  Top  Quark 

Let  us  first  look  at  some  general  properties  of  the  heavy  quarks.  A  hadron  con- 
taining a  heavy  quark  can  undergo  weak  decay  where  this  heavy  quark  turns  into 
a  lighter  quark.  Since  the  energy  released  by  the  heavy  quark  is  much  bigger  than 
the  typical  quark  binding  energy,  it  is  plausible  to  assume  that  the  heavy  quark 
decays  independently  of  the  other  constituents.  The  other  constituents  act  as  pas- 
sive spectators.  These  quarks,  together  with  possible  other  quark-antiquark  pairs, 
form  hadrons  with  unit  probability  [16,  17].  This  is  the  "spectator  approximation". 

Because  of  the  nearly  diagonal  character  of  the  quark  mixing  matrix,  the  most 
favored  route  for  a  heavy  quark  decay  is  either  to  the  same  generation  (i.e.,  c  — >  s) 
or, if  this  is  kinematically  impossible,  to  the  nearest  generation  (i.e.,  b  — >  c).  As  a 
result,  heavy  quark  decays  go  preferentially  via  a  "cascade" 

c  — ►  s 

b   ->   c  ->  s 

t  — >   b  -»  c  ->  s 

with  (virtual)  W  emission  at  each  stage.  One  consequence  is  multilepton  produc- 
tion because  at  each  step  of  the  cascade  it  is  possible  to  produce  a  £+ve  or  i~vn 
pair.  The  full  list  of  possibilities  is 

t   —¥   b  (eu,  p,v,  fv,  cs,  uct) 
b   ->   c  (ev,  [iv,  tv,  cs,  ud) 
c  — >   s  (ev,  pv,  uctj 
t   —¥   v  (ev,  [iv,  ud) 

where  each  step  involves  even  more  options.  Another  consequence  of  cascade  de- 


46 


cays  is  more  secondary  decay  vertices  close  to  the  primary  quark  production  vertex. 
Bottom  quarks  give  typically  two  such  vertices,  one  from  b  and  one  from  c  decay, 
but  with  modes  like  b  — >  ctv  there  is  a  third  vertex  .  Top  quarks  are  too  short-lived, 
and  it  is,  therefore,  very  difficult  to  resolve  their  production  and  decay  vertices. 
These  multiple  decay  vertices  provide  a  valuable  way  to  identify  heavy-quark  event 
experimentally. 

The  kinematics  of  decay  of  a  hadron  (tq)  or  (tqq)  containing  heavy  top  quark, 
t,  are  essentially  the  kinematics  of  t  decay.  The  dominant  feature  of  t  decay  is  its 
large  energy  release.  As  a  consequence,  the  decay  products  have  large  invariant 
mass.  In  principal,  for  a  hadron  containing  a  heavy  top  quark,  one  can  identify 
the  mass  of  the  quark  using  the  four  momentum  (i.e.,  YlPi)2  =  m\-  In  practice 
however,  there  are  usually  many  hadrons  in  an  event  which  makes  it  difficult  to 
identify  those  which  come  from  hadronic  decays  of  t. 


CHAPTER  3 

NEURAL  NETWORKS  AND  FISHER  DISCRIMINANTS 

3.1    Artificial  Neural  Networks 

Artificial  Neural  Networks,  ANN,  provide  an  emerging  paradigm  for  pattern 
recognition  implementation  [18]  that  involves  large  interconnected  networks  of  rela- 
tively simple  and  typically  nonlinear  units  called  processing  units  (neurons)  [19,  20]. 
A  very  important  feature  of  these  Artificial  Neural  Networks  is  their  adaptive  na- 
ture, where  learning  by  examples  replaces  programming  in  solving  problems.  This 
feature  makes  such  computational  models  very  appealing  in  application  domains 
such  as  high  energy  physics  where  training  data  is  readily  available.  Other  key 
features  are  the  intrinsic  parallel  architecture  along  with  the  non-linear  internal 
processing  of  the  input  variables  which  allows  for  fast  computation  of  the  solutions 
while  taking  into  account  higher  dependencies  of  the  feature  function  to  input 
variables.  Fisher  Discriminant,  on  the  other  hand,  relies  only  on  linear  dependen- 
cies. It  provides  a  feature  function  that  maps  the  n-dimensional  space  to  a  single 
dimensional  space  suitable  for  data  classification.  It  achieves  that  by  rotating  the 
variable  space  in  such  a  way  that  a  certain  physical  observable  is  optimized.  This 
method  will  be  discussed  in  more  detail  in  later  chapters.  Of  these  two  techniques, 
ANN  provides  the  more  general  approach  and  therefore  has  an  edge  over  the  Fisher 
Discriminant  method  especially  if  the  variable  space  is  highly  correlated. 

From  statistical  modeling  point  of  view,  ANN  models  belong  to  the  general  class 


47 


48 


of  non-parametric  methods  that  do  not  make  any  assumption  about  the  paramet- 
ric form  of  the  function  they  model.  In  this  sense  they  are  more  powerful  than 
parametric  methods  that  try  to  fit  reality  into  a  specific  parametric  form.  However, 
non-parametric  methods  like  ANN  contain  more  free  parameters  and  hence  require 
more  training  data  than  parametric  ones  require  for  fitting,  in  order  to  achieve  good 
generalization  performance.  Fortunately,  for  most  high  energy  problems  one  has 
access  to  a  big  data  sample,  making  it  possible  to  exploit  the  capabilities  of  the 
non-parametric  models  like  ANN.  There  have  been  extensive  test  of  ANN  versus 
standard  methods  on  pattern  recognition  in  high  energy  physics  problems  and  they 
all  seem  to  favor  ANNs  [20,  21].  Ultimately,  the  choice  of  method  depends  on  the 
nature  of  the  problem  under  investigation.  Is  the  problem  complex  enough  to  call 
for  a  non-parametric  method  like  ANN?  Is  data  easily  available?  Is  time  not  a  big 
consideration?  To  help  answer  these  questions,  we  look  at  the  characteristics  of 
the  most  commonly  used  ANN  technique  in  the  physics  community,  namely  the 
Feed-Forward  Neural  Network  with  gradient  descent  algorithm.  This  technique 
will  be  discussed  in  more  detail  in  the  following  chapters. 

In  principle,  ANNs  can  compute  any  computable  function,  i.e.  they  can  do 
everything  a  normal  digital  computer  can  do.  Especially  anything  that  can  be 
represented  as  a  mapping  between  vector  spaces  can  be  approximated  to  arbitrary 
precision  by  Feed-Forward  Neural  Networks.  In  practice,  ANNs  are  especially 
useful  for  mapping  problems  which  are  tolerant  of  some  errors,  have  lots  of  example 
data  available,  but  to  which  hard  and  fast  rules  can  not  easily  be  applied.  Artificial 
Neural  Networks  such  as  Feed-Forward  Neural  Networks  have  a  number  of  features 
that  make  them  particularly  attractive. 

•  ANN  maps  a  complex  variable  space  to  a  simpler  often  one-dimensional  fea- 
ture space  which  makes  defining  the  decision  boundary  easier. 


49 


•  Activation  function  is  usually  an  analytic  function  and  therefore  derivatives 
can  easily  be  found  and  hence  avoid  error  due  to  approximation. 

•  An  ANN  is  usually  a  nonlinear  function  of  its  weights  and  therefore  the 
non-linear  nature  of  the  parameters  make  them  suitable  to  capture  higher 
correlations  and  more  subtle  covariances  among  variables  and  therefore  ca- 
pable of  finding  a  decision  surface  more  easily. 

•  ANN  tend  to  be  very  robust  (especially  true  for  Error  Back  Propagation, 
EBP,  architecture)  so  less  likely  to  give  substantial  errors  if  parameters  are 
changed  slightly. 

•  ANN  are  extremely  flexible  to  the  number  of  parameters,  nodes  and  layers 
and  hence  offers  more  freedom  in  selecting  the  topology  of  the  network. 

Although  a  number  of  artificial  neural  structures  [20,  21,  22]  exist  and  more 
continue  to  appear  as  research  continues,  many  of  these  structures  have  common 
properties.  Basically,  three  entities  characterize  an  ANN: 

1.  The  network  topology,  or  interconnection  of  neural  units. 

2.  The  characteristics  of  the  basic  processing  units  or  artificial  neurons 

3.  The  strategy  for  pattern  learning  or  training. 

These  are  the  key  characteristics  that  distinguishes  one  Neural  Network  structure 
from  the  other.  In  this  chapter  we  focus  on  a  multilayer  Feed-Forward  Network 
structure  with  the  gradient  based  algorithm  for  training  because  that  is  the  archi- 
tecture we  have  adapted  in  our  data  analysis. 


50 

X 

y 


Figure  3.1:  Multilayer  Feed  Forward  Neural  network  with  one  hidden  layer. 

3.2    Feed-forward  Neural  Structure 

The  Feed-forward  network  is  composed  of  a  hierarchy  of  basic  processing  units, 
organized  in  a  series  of  two  or  more  mutually  exclusive  sets  of  neurons  or  layers. 
The  first  or  input  layer  serves  as  a  distributing  layer.  No  processing  is  performed 
at  this  layer.  The  last,  or  output  layer  is  the  point  at  which  the  final  state  of  the 
network  is  read.  A  minimal  network  should  have  at  least  these  two  layers.  Between 
these  two  layers,  there  can  be  zero  or  more  layers  of  hidden  units.  The  net  output 
is  a  function  of  the  input  vector  x  and  the  set  of  internal  memory  parameters, 
{W}  as  follows: 

Onet  =  F(x,{W})  (3.1) 

Processing  units  in  each  layer  contribute  to  the  input  of  the  units  in  the  next  layer. 
Each  unit  has  an  internal  memory  in  the  form  of  weight  vector  W  which  is  subject 
to  change  during  the  learning  phase.  The  output  of  the  unit  is  the  dot  product  of 
this  vector  with  its  input  vector  and  the  result  is  acted  upon  by  a  function  which 
we  refer  to  as  the  activation  function.  A  processing  unit  normally  has  a  internal 
bias  (threshold)  term  but,  this  term  can  be  absorbed  into  the  weight  vector  with 


51 


the  corresponding  vector  component  taken  to  be  one. 

yj  =  f(xTWj)  =  f(j2Wjkxk)  (3.2) 

Here,  yj  is  the  output  of  the  jth  unit  and  a^s  are  the  components  of  the  n- 
dimensional  input  vector  weighted  by  the  internal  memory  vector  Wj.  Hence, 
the  index  j  runs  over  the  number  of  processing  units  in  a  given  layer  and  index  k 
runs  over  the  number  of  inputs  to  a  given  unit  (i.e.,  n).  Note  that  index  k  begins 
at  zero  since  we  are  taking  Wj0  to  be  the  units  internal  bias  (threshold)  and,  hence 
the  Xq  is  always  equal  to  one.  The  processing  units  in  each  layer  continue  the 
processing  of  the  input  from  previous  layer  and,  their  outputs  become  the  input 
for  the  next  layer.  Cumulative  effect  of  the  processing  done  in  each  unit  ultimately 
determines  the  network's  outcome.  Therefore,  identifying  appropriate  weight  pa- 
rameters is  the  key  to  the  performance  of  a  network,  and  we  will  discuss,  in  the 
next  section,  methods  for  adjusting  these  weight  terms  (learning  method)  so  that 
the  net  is  optimized. 

Let  us  consider  the  specific  network  architecture  shown  in  Figure  3.1.  The  net- 
work is  simple  with  one  hidden  layer  but  displays  all  characteristics  of  a  multilayer 
Feed-Forward  Network.  We  need  to  establish  specific  notations  to  demonstrate  its 
operation  and,  these  notations  will  be  used  in  the  next  section  to  calculate  weight 
corrections.  We  use  the  following  notation  for  our  example 


x(k) 

=   Component  of  the  input  vector,  x 

(3.3) 

y(j) 

=   Output  of  the  hidden  units,  y 

(3.4) 

0{i) 

=   Output  of  the  network,  output  units,  O 

(3.5) 

W0(i,j) 

=   Weights  associated  with  the  output  units 

(3.6) 

Wh(j,k) 

=   Weights  associated  with  the  hidden  units 

(3.7) 

Nin 

=   Dimension  of  the  input  vector,  input  units,  x 

(3.8) 

52 


Nh    =   number  of  the  units  in  the  hidden  layer 


(3.9) 


Nr 


out 


=   Dimension  of  the  ouput  vector,  output  units 


(3.10) 


For  this  network,  the  input  vector,  x,  is  processed  by  units  in  the  first  hidden  layer 
and  the  corresponding  outputs,  y(j)s,  are  dispatched  to  the  next  layer. 


These  outputs,  y  —  (yi,  y2, . .  . ,  VNh),  in  turn,  are  processed  in  the  next  layer  (out- 
put layer)  in  a  similar  fashion. 


The  component  of  the  output  vector  defines  the  desired  mapping  of  the  Nin  di- 
mensional space  to  the  Nout  dimensional  space.  It  is  straightforward  to  extend  this 
procedure  to  networks  with  two  or  higher  number  of  hidden  layers.  Although,  in 
practice,  it  is  very  uncommon  to  have  a  need  for  a  network  with  more  than  two 
hidden  layers. 

From  the  discussion  so  far,  it  is  clear  that  for  this  mapping  to  work  we  have  to 
define  a  criteria  which  optimizes  a  certain  physical  observable.  The  most  common 
criteria  is  minimization  of  the  Eucledian  distance  between  the  output  vector  and 
some  target  (training  vector).  We  will  discuss  this  in  more  details  later.  The 
key  is,  therefore,  to  adjust  the  internal  units  memory  (weight  parameters)  so  that 
the  desired  optimization  is  accomplished.  This  is  done  during  the  learning  phase, 
where  a  sample  of  input  vectors  with  their  known  outputs  are  used  to  tune  the 
weights  and  the  thresholds.  In  general,  there  are  large  number  of  weight  terms  that 
need  to  be  adjusted.  For  example,  for  the  case  of  a  network  with  Nin  input  neurons, 


(3.11) 


(3.13) 


(3.12) 


53 


Nh  hidden  neurons,  and  Nout  output  neurons  the  number  of  weight  parameters, 
N(W),  are: 

N(W)  =  (Nm  +  l)Nh  +  (Nh  +  l)Nmt  (3.14) 

In  the  next  section  we  explore  some  possible  methods  of  adjusting  these  weight 
parameters. 

3.3    Learning  Methods 

For  each  input  stimulus,  a  desired  output  configuration  is  presented  to  the 
network's  output  units,  and  the  weights  are  adjusted  so  as  to  achieve  the  desired 
input/output  mapping.  In  supervised  teaching  some  external  entity,  which  knows 
the  desired  output,  corrects  the  ANN  whenever  the  output  is  incorrect.  The  de- 
signer of  the  ANN  sets  up  a  learning  procedure  in  the  form  of  software  code  to 
be  used  in  the  training  phase.  During  the  learning  stage,  the  process  of  correction 
and  reevaluation  is  reiterated  and  the  weight  updates  as  a  function  of  the  training 
step  is  computed  as  follows: 

Wij  (*)  ->  Wa  {t  +  l)  =  Wij  {t)  +  AWij  (t)  (3.15) 

Two  common  methods  of  weight  adjustment  are: 

•  Hebbian  learning:  A  weight  increase  in  proportion  to  the  product  of  the  ac- 
tivation status  of  the  two  neurons  involved 

AWij  =  r\  Xi(t)xj(t).  (3.16) 

This  reflects  the  obvious  notion  that  a  coupling  constant  has  to  be  larger  if 
there  is  a  strong  coupling  between  the  input  stimulus  and  the  output  reaction. 
In  biological  term,  this  means  that  a  neural  pathway  is  strengthened  each 
time  it  is  used. 


54 


•  Delta  rule  learning:  Based  on  reducing  the  error  between  the  output  of  a 
processing  unit  and  the  desired  output.  This  rule  implements  a  gradient 
descent  along  the  error  function,  defined  in  the  space  of  the  weights.  It  will 
be  thoroughly  treated  in  the  following  sections. 

dE 

AW-> m w/  (3'17) 

Here  E  is  the  error  function  which  could  be  implementation  dependent  and 
we  will  show  one  specific  error  construct. 

To  demonstrate  the  actual  procedure  for  calculating  these  weight  corrections 
we  consider  the  network  architecture  introduced  in  the  previous  section,  Figure 
3.1. 

The  Delta  rule  is  a  paradigm  of  supervised  learning,  normally  applied  to  Feed- 
forward ANN.  The  neurons  in  the  first  layer  just  distribute  the  input  signal  to 
the  next  layer  without  any  modification.  Suppose,  we  present  a  set  of  input  pat- 
tern {xp}  to  the  input  layer  of  the  network.  The  index  p  ranges  on  the  num- 
ber of  patterns.   These  input  patterns  are  iVin-dimensional  vectors  denoted  by 


55 


x  =  {xi,  x2, . . .  xin).  Corresponding  to  each  input  vector  there  is  an  output  vector 
which  has  a  dimension  of  Nout,  O.  The  dimension  of  the  output  vector  is  deter- 
mined by  the  number  of  output  units.  The  dimension  of  the  output  need  not  to 
be  the  same  as  that  of  the  input  and,  two  spaces  are  in  general  different.  For  each 
input  pattern,  assume  that  the  desired  target  output  Tp  is  known.  Then  a  signed 
difference  (Op(i) —  Tp(i))  can  be  computed  for  each  component  of  the  output  vector 
Op(i),  and  we  can  define  a  quadratic  form  measuring  the  discrepancy  between  the 
output  Op  and  the  target  Tp. 

EP  =  ^l|0P-Tp||2  =  \Nf:{Op(z)-Tp(i)r  (3.18) 

where  the  summation  extends  over  i,  the  number  of  output  neurons.  Ep  is  thus 
proportional  to  the  square  of  the  Euclidean  distance  between  output  and  target. 
For  the  Entire  set  of  patterns,  Np,  we  define  a  global  error  function  E  which 
measures  the  quality  of  the  approximation  to  the  set  {Tp}  given  by  the  set  {Op}\ 

NP 

E  =  Y.EP  (3.19) 

p=i 

E  is  a  quadratic  form  of  the  weights,  and  its  minimum  corresponds  to  the  optimal 
configuration  for  the  network.  The  Delta  rule  consists  of  modifying  the  weights 
proportionally  to  the  rate  of  decrease  of  the  error  function  with  respect  to  that 
particular  weight,  (i.e.,  a  gradient  descent)  which  can  be  written: 

AW[i'3)  =  -"(()  Jh  (3-20) 

where  rj  is  a  positive  quantity  which  in  general  is  a  function  of  the  training  process 
and  depends  on  the  speed  of  convergence. 

Let  us  apply  the  gradient  descent  to  update  a  weight  term  in  the  output  layer, 
Wa(i,j).  Recall  here  that  the  output  of  a  dot-product  neuron  is  a  function  of  the 


56 


weighted  sum  of  the  inputs: 

0,(0  =  /   E^jH(j')J  =  f(Sp(i))  (3.21) 

where  we  have  defined  Sp(i)  =  T,W0(i,j)yp(j).  Here  yp(j)  is  the  output  of  the 
previous  layer  (i.e.,  hidden  layer).  We  can,  now,  compute  the  gradient  of  the  error 
function  as  follows. 

dE  ^  dEp 

dW0(zJ)    ~  ftldW0(i,j) 

^  8EP  dOp(i)  dSp(i) 
^dOp(i)  dSp(i)  8W(i,j) 

Np 

=    Y.(Op{i)-Tp(i))f(Sp{i))yp(3)  (3.22) 
P=i 

Thus  the  weight  change  can  be  written  as 

v 

=    -V  T,(0P(t)-Tp(t))f'(Sp(i))yp(j) 
v 

=    -V  J2Sp(l)yP(j)  (3-23) 
p 

where  Sp(i)  =  (Op(i)  -  Tp(i))  f  (Sp(i)). 

Note  that  the  form  of  the  variation  of  the  weight  induced  by  the  presentation  of 
pattern  p  (i.e.  ApW0(i,j))  reminds  of  the  Hebb  rule,  in  the  sense  that  the  weight 
change  depends  on  the  strength  of  the  "cause"  (the  value  of  the  component  of  the 
input  vector  which  is  related  to  that  particular  weight,  yp(j),  and  on  the  strength 
of  the  "effect"  (in  this  case  the  value  of  the  mismatch  between  output  and  target, 
5p(i).  This  term  includes  the  derivative  of  the  activation  function,  since  it  gives 
a  measure  of  the  intensity  of  the  reaction  of  the  output  neuron  to  its  input  (the 
weighted  sum  Sp(i). 


57 


Sigmoidal  Function 


0.3 
0.25 

I     I  I 

— 

0.2 

0.15 

0.1 

0.05 
0 

i     i  i 

-2  -1.5  -1  -0.5  0  0.5  1  1.5  2 
Figure  3.3:  Sigmoidal  function  for  a  =  1,2. 

3.4    Bias  Input  and  Activation  Function 

One  way  of  looking  at  the  need  for  bias  inputs  is  that  the  inputs  to  each 
unit  in  the  net  define  an  N-dimensional  space,  and  the  unit  draws  a  hyperplane 
through  that  space,  producing  an  "on"  output  on  one  side  and  an  "off'  output 
on  the  other.  The  weights  determine  where  this  hyperplane  is  in  the  input  space. 
Without  a  bias  input,  this  separating  plane  is  constrained  to  pass  through  the 
origin  of  the  hyperspace  defined  by  the  inputs.  In  most  real  world  problem  that  is 
not  a  desirable  constraint  on  the  location  of  the  decision  boundary,  and  it  would 
be  more  useful  if  it  is  somewhere  else. 

Activation  functions  are  needed  to  introduce  nonlinearity  into  the  network. 
Without  nonlinearity,  hidden  units  would  not  make  nets  more  powerful  than  just 
plain  perceptrons  (which  do  not  have  any  hidden  units,  just  input  and  output 
units).  The  reason  is  that  a  composition  of  linear  functions  is  again  a  linear 
function.  However,  it  is  just  the  nonlinearity  (i.e.,  ,  the  capability  to  represent 
nonlinear  functions)  that  makes  multilayer  networks  so  powerful.  There  is  no 
constraint  on  the  choice  of  the  non-linear  function,  although  for  backpropagation 
learning  it  must  be  differentiable  and  it  helps  if  the  function  is  bounded.  The  most 


58 


popular  choice  is  Sigmoidal  function  but  we  look  at  other  possibilities  as  follows: 

•  Linear  units:  This  is  the  trivial  case  of  a  so-called  linear  association.  In  the 
context  of  Feed-Forward  Networks,  a  linear  activation  function  reduces  the 
network  operation  to  simple  matrix  multiplication  because  the  activation 
function  /  of  each  neuron  is  a  multiplication  by  some  constant  number,  i.e. 

Op(i)  =  c-Sp(i).  (3.24) 

where  c  is  a  constant.  Thus  the  derivative  of  the  activation  function  ap- 
pearing in  AW(i,j)  is  just  a  constant  number.  Most  often,  for  simplicity, 
this  constant  is  taken  to  be  1;  in  this  case  the  term  5p(i)  represents  exactly 
the  difference  between  output  and  target  vectors'  i-th  component.  The  error 
function  E,  in  case  of  linear  units  has  only  a  global  minimum. 

•  Non-linear  units:  The  fact  that  the  first  derivative  of  the  function  /  appears, 
prevents  the  use  of  this  learning  rule  when  neurons  with  a  discontinuous 
activation  function  are  present.  There  is  a  variety  of  functions  which  can  be 
used  though  the  most  common  one  is  the  sigmoid  function  which  is  also  the 
one  that  we  will  be  using  in  our  analysis  and  it  is  given  by: 

M  =  TV^Ts  (3-25) 
where  0  <  f(S)  <  1.  The  derivative  of  the  activation  function  is  given  by 

f'(S)=af(S)(l-f(S))  (3.26) 

Note  that  in  the  applications  where  the  output  units  should  reach  a  binary  state 
the  sigmoid  derivative  is  higher  which  indicates  that  there  is  a  stronger  change  for 
the  units  which  are  undecided.  This  adds  to  the  stability  of  the  system. 


■59 


We  apply  the  Generalized  Delta  Rule  to  the  output  units  of  a  two  layer  Feed- 
Forward  Network  for  which  the  difference  from  a  target  pattern  is  readily  com- 
puted. We  can  explicitly  substitute  for  the  activation  function  to  obtain  the  weight 
correction. 

Np 

&w0(i,j)   =   -v  Y,6p(i)yP(j) 
P=i 

Np 

=    -not  £  (Op(z)  -  Tp(i))  /  (5p(i))  (1  -  f(Sp(i)))  VPU)  (3-27) 
P=i 

3.5    Hidden  Layer  Weight  Update 

The  learning  rule  for  the  hidden-layer  weights,  Wh(j,  k),  is  not  as  obvious  as  that 

for  the  output  layer  because  we  do  not  have  available  a  set  of  target  values  (desired 

outputs)  for  hidden  units.  However,  one  may  derive  a  learning  rule  for  hidden  units 

by  attempting  to  minimize  the  output  layer  error.  This  amounts  to  propagating 

the  output  errors  (Op(i)  -Tp(i))  back  through  the  output  layer  toward  the  hidden 

units  in  an  attempt  to  estimate  "dynamic"  targets  for  these  units.  Such  a  learning 

rule  is  termed  "error  backpropagation"  or  "backprop  learning  rule"  [12,  13]  and 

may  be  viewed  as  an  extension  of  the  delta  rule  for  updating  the  output  layer.  To 

complete  the  derivation  of  backprop  for  the  hidden  layer  weights,  and  similar  to  the 

preceding  derivation  for  the  output  layer  weights,  gradient  descent  is  performed 

on  the  error  measure,  E,  but  this  time  the  gradient  is  calculated  with  respect  to 

the  hidden  weights,  Wh(j,k). 

dE  ^  dEp 

dWh(j,k)  ~^  dWh(j,  k) 

^  dEp  dOp(i)  dSp(i) 

np  Nout  cfc  /  \ 

=  E£(Op(0-T,(0)/'(s,(0)~^^ 

P=i  i=i  dWh(j,k) 


60 


where 


=  E  E  (0,(0  -  W)  /'  (5P(i))  E  Wo(i,  i)  »j^t 

p=l  i=l  j=0  OVVh{J,K) 

=  EE  (o,(0 - r^co) /' &(0) ^.(i, j)f  (sP(j)) x{k) 

p=l   2  =  1 


[Nh 

/'(5PW)  =  /'  E^(^^p0')h  (3-28) 
f'(sP(j))  =  f'\Y,wh(j,k)xp(k)), 

\fc=o 


and 


f'{S)  =af(S)(l-f(S)).  (3.30) 

In  an  analogy  to  output  layer  weights,  the  correction  terms  for  hidden  layer  weights, 
Wh{j,  k),  are  calculated  as  follows. 

=  E£(0P({)-Tp(i))f'{Sp(i))Wc{i,j)f'(Srti))xr(k) 

p=l  i=l 

These  learning  equations  may  also  be  extended  to  feed-forward  nets  with  more 
than  one  hidden  layer  and/or  nets  with  connections  that  jump  over  one  or  more 
layers.  The  complete  procedure  for  updating  the  weights  in  a  feed-forward  neural 
net  utilizing  these  rules  is  summarized  bellow. 

1.  Initialize  all  weights  and  refer  to  them  as  W(t). 

2.  Set  the  learning  rate  r](t)  to  small  positive  value  0  <  rj(t)  <  1  but  allow  for 
its  value  to  be  adjusted  dynamically  according  to  the  rate  of  convergence 

3.  Select  a  set  of  inputs,  {xp}  from  the  training  set  (preferably  at  random)  and 
propagate  it  through  the  network,  thus  generating  hidden  and  output  unit 
results  based  on  the  current  weight  settings. 


61 


4.  Using  an  error  measure,  compare  the  net  output,  O,  to  the  desired  output, 
T,  for  a  given  set  of  input  vectors  {xp},  and  if  the  error  is  not  minimized  to 
the  desired  degree  proceed  to  adjust  the  weights 

5.  Apply  the  algorithm  described  in  this  chapter  to  calculate  correction  terms 
for  output  and  hidden  layer  weights  AW(t)  and,  use  them  to  calculate  the 
new  weights  W(t  +  1)  =  W(t)  +  AW{t). 

6.  Compute  the  Error.  Test  for  convergence  to  see  if  the  output  is  bellow 
some  preset  threshold.  If  convergence  is  met,  stop;  otherwise,  go  to  step  5. 
Continue  reiterating  the  weight  adjustment  and  testing  until  the  convergence 
is  attained. 

It  should  be  noted  that  backprop  may  fail  to  find  a  solution  that  passes  the  con- 
vergence test.  In  this  case,  one  may  try  to  reinitialize  the  search  process,  re-adjust 
learning  parameters  (i.e.,  rj,a),  restructure  the  network,  use  more  hidden  units  or 
even  more  hidden  layers.  Although,  a  network  with  more  than  two  hidden  layer 
is  unlikely  to  result  into  improved  performance.  It  is  here  that  a  Feed-Forward 
neural  network  displays  its  weakness  because  there  is  no  rigorous  method  to  find 
its  optimum  size  and  architecture. 

This  procedure  is  based  on  "batch  learning"  where  weight  updating  is  per- 
formed after  Np  patterns  have  been  presented  to  the  network.  Another  alternative 
is  based  on  "incremental  learning",  which  means  that  weights  are  updated  after 
every  presentation  of  an  input  pattern. 

AH^>  =  -^ImtT)  (3"31) 

One  can  have  "approximate  incremental  learning"  where  the  size  of  the  Np  is  much 
less  than  the  total  number  of  the  data  in  the  sample  in  which  case  the  size  of  the  set 


62 


used  in  updating,  jVp,  is  referred  to  as  "epoch".  Although  batch  updating  moves 
the  search  point  W  in  the  direction  of  the  true  gradient  at  each  update  step,  the 
incremental  updating  is  more  desirable  for  two  reasons:  (1)  it  requires  less  storage, 
and  (2)  it  makes  the  search  path  in  the  weight  space  stochastic  since  at  each  time, 
the  input  vector  x  is  drawn  at  random.  This  allows  for  a  wider  exploration  of  the 
search  space  and,  potentially,  leads  to  better  solutions.  When  backprop  converges, 
it  converges  to  a  local  minimum  of  the  error  function.  This  fact  is  true  of  any 
gradient-descent-based  learning  rule  when  the  surface  being  searched  is  noncon- 
vex;  i.e.,  it  admits  local  minima.  Using  stochastic  approximation  theory,  Finnoff 
[23,  24]  showed  that  for  "very  small"  learning  rates  (approaching  zero),  incremen- 
tal backprop  approaches  batch  backprop  and  produces  essentially  the  same  results. 
However,  for  a  small  learning  rate  the  cumulative  gradient  calculated  in  the  in- 
cremental backprop  method  is  continuously  perturbed  allowing  to  explore  wider 
search  space  and  to  escape  local  minima  with  shallow  basin  of  attraction. 

3.6    Local  Minima,  Flat  Spaces,  and  Overfitting 

Feed-Forward  Neural  Networks  with  error-backpropagation  algorithm  has  been 
a  successful  technique  in  real  world  applications.  Yet  there  are  some  potential 
sources  of  problems  in  applying  this  technique.  We,  now,  discuss  some  of  these 
problems  and  offer  ways  to  avoid  them. 

The  Generalized  Delta  Rule  with  backpropagation  implements  a  gradient  de- 
scent in  the  weight  space,  but  there  are  several  possible  minima  of  the  error  func- 
tion. An  empirical  way  out  of  this  difficulty  is  to  choose  appropriately  the  learning 
parameter  77.  The  efficient  way  is  to  allow  the  value  of  77  to  be  adjusted  dynami- 
cally. Also,  note  that  as  the  system  approaches  locally  flat  regions,  perhaps  a  local 
minimum,  the  value  of  the  learning  parameter  can  increase  dramatically,  and  if  it 


G3 


is  not  bounded,  the  training  session  will  come  to  a  halt.  It  has  been  our  experience 
that  using  dynamically  adjusted  rate  will  drastically  improve  the  time  required  for 
learning[25]. 

To  speed  up  the  descent[26,  27],  especially  in  the  first  learning  iterations,  a 
momentum  a  is  used,  which  remembers  the  fact  that  the  weight  updating  in  the 
previous  iteration  was  still  strong,  meaning  that  one  is  far  from  the  minimum  and 
can  move  faster.  So  the  actual  prescription  for  the  weight  updating  becomes: 

AnW(i,j)  =  -rj  +  a  A^Wfrj),  (3.33) 

dW(i,j) 

where  n  indicates  the  n-th  iteration.  Often,  instead  of  computing  the  error  func- 
tion for  the  entire  set  of  input  vectors,  and  then  performing  the  weight  updating, 
one  makes  the  updating  after  a  certain  smaller  number  of  vector  presentations, 
"epoch". 

Overfitting,  often  also  called  "overtraining"  or  "overlearning" ,  is  the  phenomenon 
that  a  network  gets  worse  instead  of  better  after  a  certain  point  during  training. 
This  is  because  such  long  training  may  make  the  network  'memorize'  the  training 
patterns,  including  all  of  their  peculiarities.  However,  one  is  usually  interested  in 
the  generalization  of  the  network,  i.e.,  the  error  it  exhibits  on  examples  not  seen 
during  training.  Learning  the  peculiarities  of  the  training  set  makes  the  generaliza- 
tion worse.  The  network  should  only  learn  the  general  structure  of  the  examples. 

There  are  various  methods  to  fight  overfitting.  The  two  most  important  classes 
of  such  methods  are  regularization  methods  (such  as  weight  decay)  and  early  stop- 
ping. Regularization  methods  try  to  limit  the  complexity  of  the  network  such  that 
it  is  unable  to  learn  peculiarities.  In  general,  the  more  complex  the  networks  are 
(more  layers  and  more  processing  units)  more  susceptible  they  are  to  overtraining. 
Therefore,  reducing  the  number  of  neurons  and  consequently  weight  parameters, 
"pruning" ,  could  prevent  overtraining.  Early  stopping  is  a  simpler  approach  and  it 


64 


aims  at  stopping  the  training  at  the  point  of  optimal  generalization.  This  method 
requires  constant  monitoring  of  the  network  performance  by  testing  it  against  an 
unseen  set  of  patterns. 

Training  a  network  can  be  a  very  lengthy  process  and  one  of  the  important 
tasks  of  a  net  designer  is  to  optimize  the  training  speed  of  the  network.  A  number 
of  techniques  exist  to  accelerate  the  process. 

•  random  generation  of  weights  before  starting  the  training. 

•  using  a  dynamic  learning  parameter  as  opposed  to  using  a  fixed  rate. 

•  optimization  of  the  number  of  neurons  and  layers.  This  is  a  very  loose  recom- 
mendation because  it  is  very  hard  to  define  an  optimum  size  for  the  network. 
There  are  some  some  pruning  techniques  which  help  reduce  the  complexity  of 
the  network.  There  are  also  some  general  rules  which  are  purely  on  the  basis 
of  experience  and  are  as  follows:  start  with  only  one  hidden  layer  and  only 
move  to  two  hidden  layer  if  the  performance  of  the  net  is  not  satisfactory. 
Having  more  than  one  hidden  layer  the  learning  algorithm  is  considerably 
more  complex  and  the  training  phase  is  much  longer. 

There  is  no  way  to  determine  a  good  network  topology  just  from  the  number  of 
inputs  and  outputs.  It  depends  critically  on  the  number  of  training  examples  and 
the  complexity  of  the  classification  problem  at  hand.  Some  people  offer  general 
rules  for  choosing  a  topology  but  experience  shows  that  networks  can  be  quite 
insensitive  to  the  number  of  units.  Other  rules  relate  to  the  number  of  examples 
available:  Use  at  most  so  many  hidden  units  that  the  number  of  weights  in  the 
network  times  10  is  smaller  than  the  number  of  examples.  Such  rules  are  only 
concerned  with  overfitting  and  are  unreliable  as  well.  Also,  there  is  no  precise  rule 
as  to  how  many  patterns  must  be  used  in  the  learning  sample.  On  a  heuristic  basis, 


65 


it  is  advisable  to  have  a  number  of  input  patterns  at  least  few  times  the  number 
of  weights  to  be  optimized. 

3.7    Statistical  Classifiers 

Bayes  classifier  is  the  optimal  classifier  in  statistical  approach  which  minimizes 
the  probability  of  classification  error.  It  is,  however,  very  general  in  its  scope  and 
as  a  result  very  difficult  to  implement.  In  practice,  because  of  the  complexity  of  the 
problem,  particularly  when  dimensionality  is  high,  people  often  resort  to  simpler 
parametric  classifiers.  In  this  chapter  we  study  Bayes  Classifiers  and  some  simple 
parametric  methods  used  in  classification  [28]. 

In  pattern  recognition [29,  30],  we  deal  with  random  vectors  pointing  to  different 
classes,  each  of  which  is  characterized  by  its  own  density  function.  In  other  words, 
given  the  vector  x  we  want  to  find  the  probability  of  x  belonging  to  class  u;,.  This 
is  referred  to  as  "posteriori  probability"  and  will  be  denoted  by  q{u)i\x)  or  qi{x)  for 
short.  Here  u>i  stands  for  class  i  and  the  corresponding  density  function  is  referred 
to  as  conditional  density  of  class  i  and  will  be  denoted  by  P(x\cOi)  or  Pi(x)  where 
(i  =  1, 2, . . . ,  L).  A  "posteriori  probability"  of  u>i  given  x  can  be  computed  as 
follows. 

TXiPi{x) 

P(x) 

where  7Tj  is  "a  priori  probability"  or  relative  frequency  of  class  i,  and  P(x)  is  the 
mixture  density  which  is  given  by 

L 

p(x)  =  J2niPi(x) 

i 

Mixture  density  is,  therefore,  the  sum  of  the  conditional  densities  of  the  classes 
weighted  by  their  relative  frequency  or  "a  priori  probability"  density 
In  the  discussion  that  follow  the  following  notation  is  used 


66 


Pi(x)  —  pi(xi,x2,.  ■  ■  ,xn)   conditional  density  function  of 


L 


p[x)  =  ^2'Kipi(x)   Mixture  density  function 

i 

TXi—   A  Priori  Probability  of  u)i 

q^x)  =   A  Posteriori  Probability  of  u>i  given  x 

P(x) 

Mi  =  E{(X\LUi}   Expected  Vector  of  coi 

L 

M  —  E{X)  =  ^TtiMi   Expected  Vector  of  the  Mixture  density 

Si  =  E{(X  -  Mi)(X  -  Mi)T\uJi}  Covariance  Matrix  of  the  uj{ 

£  =  E{(X  -  M)(X  -  M)T}  ....  Covariance  Matrix  of  the  Mixture  density 
The  last  expression  can  be  written  explicitly  as 

E   =   E{(X  -  M)(X  -  M)T}  (3.34) 

L 

=   ^{7rJE,+7rl(Ml-M)(Ml-M)T}  (3.35) 

i 

Expected  vector  E(X)  or  mean  of  a  random  vector  X  is  calculated  by 

M  =  E(X)  =  J  XP(x)dx 
.  The  i  th  component  is  given  by 

rrii  =  j  XiP{x)dx  =  j  Xip{xi)dxl 
where  p(xi)  is  the  marginal  density  of  the  ith  component  of  x. 

p(xi)  =  /  P(x)dxi  . . .  dxi_idxi+i . . .  dxn 

J  (n—\)space 

The  conditional  expectation  value  of  the  random  vector  X  is  given  by 

Mi  =  E{X\ui\  =  J  XPi(x)dx. 
Covariance  matrix  is  given  by 
E   =   E{(X  -  M)(X  -  M)T] 


67 


=  E 


(  \ 


^  Xn  j 


xi  -  mi . . .  xn  -  mn 


t  c      c  ^ 

C\x     ...  C\n 


where  is 

Cij  =  E{(xl  -  rrii)  (Xj  -  rrij)} 

Cij  is  a  symmetric  matrix  and  its  diagonal  terms  are  Variances  of  the  individual 
random  variables  and  the  off  diagonal  components  are  the  Covariance  of  a  pair  of 
random  variables  be  written  as 


=  E  ({X  -  M)  (X  -  M)T) 

=  E  (XXT)  -  E(X)MT  -  ME  (XT)  +  MM1 

=  E  (XXT)  -  MMT 

=  S  —  MMT 


where  5  is  called  the  "Autocorrelation"  matrix 

/ 


S  =  E  (XXT) 


E(xiXi) 
E(xnxi) 


E(xlxn) 
E{xnxn^j 


\ 


and  E  can  be  decomposed  to 


r  = 


<72 


E  =  TRF 

C(j  —  (J%  Pij  O j 

\  ( 

R  = 


On  ) 


1  P\2  ■ 
P21  1 


•  Pin 


\  Pnl 


(3.36) 


(3.37) 


68 


Therefore,  E  can  be  written  as  a  combination  of  two  types  of  matrices.  One  is  the 
diagonal  matrix,  T,  of  standard  deviations  and  the  other  is  a  matrix  of  the  corre- 
lation coefficients.  The  standard  deviations  depend  on  the  scale  of  the  coordinate 
system  while  the  correlation  matrix  R  is  independent  of  the  coordinate  system 
and  therefore  retains  the  essential  information  of  the  relation  between  random 
variables. 

3.8    Normal  Distributions 
The  general  form  of  a  normal  distribution  is: 

Afi(j6r'5)  =  (2,)-/'isr/^xp{'^w} 

where  N£(M,Y,)  is  a  shorthand  for  normal  distribution  with  the  expected  vector 
M  and  covariance  £. 

d2(X)   =   (X  -  M)TE~1(X  —  M)  (3.38) 
=   Tr{£-\X  -  M){X  -  M)T) 

n  n 

hij  is  the  i,  j  th  component  of  E-1.  The  form  of  the  exponent  shows  that  the 
normal  distribution  is  a  function  of  a  distance  function,  d2(X),  which  is  a  positive 
definite  quadratic  function  of  the  random  vector  X.  The  coefficient  (27r)~"/2|£|-1/2 
is  the  normalization  constant. 

Normal  distributions  are  widely  used  because  of  their  many  important  proper- 
ties. Here  are  some  of  the  main  features. 

1.  The  parameters  that  specify  the  normal  distribution,  M  and  E,  are  sufficient 
to  characterize  it  uniquely.  All  higher  moments  of  a  normal  distribution  can 
be  calculated  as  a  function  of  these  parameters. 


69 


2.  In  a  random  distribution,  if  the  individual  £j's  are  mutually  uncorrelated, 
then  they  are  also  independent. 

3.  The  marginal  density  and  the  conditional  density  of  a  normal  distribution 
are  all  normal. 

4.  The  characteristic  function  of  a  normal  distribution  has  a  normal  form. 


where  Q  =  [a>i, . . . ,  u>n]    and  uj{  is  the  ith.  frequency  component. 

5.  Under  any  nonsingular  linear  transformation,  the  distance  function,  d2(X), 
keeps  its  quadratic  form  and  does  not  lose  its  positive  definiteness.  There- 
fore, after  a  nonsingular  linear  transformation  a  normal  distribution  becomes 
another  normal  distribution  with  different  parameters.  This  implies  that  it  is 
always  possible  to  find  a  nonsingular  linear  transformation  which  makes  the 
new  covariance  matrix  diagonal.  Since  a  diagonal  covariance  matrix  means 
uncorrelated  variables,  we  can  always  find  for  a  normal  distribution  a  set  of 
axes  such  that  the  random  variables  are  independent  in  the  new  coordinate 
system. 

6.  The  assumption  of  normality  is  reasonable  approximation  in  many  applied 
cases.  This  is  particularly  true  when  larger  set  of  random  variables  are 
selected  in  which  case  the  "central  limit  theorem"  can  be  applied. 

3.9    Bayes  Classifiers 

Consider  two  classes  of  objects  with  corresponding  sample  representations  uji 
and  u2.   The  conditional  density  functions  and  the  "a  priori  probabilities"  are 


=  £(exp(jQrX)) 


(3.39) 


70 


assumed  to  be  known.  Bayes  test  provide  a  quantitative  measure  of  separation  of 
these  two  classes.  According  to  Bayes  test,  given  an  observed  vector  X,  in  order 
to  identify  whether  it  belongs  to  class  io\  or  ui2  a  decision  rule  based  on  probability 
may  be  written  as  follows 

gi{x)  %  q2(x),  (3.40) 

where  qt  is  the  "posteriori  probability"  of  u>i  given  X.  If  the  probability  of  u%  is 
larger  than  the  probability  of  u2,  X  is  classified  as  u\  and  vice  versa. 

•<»)  =  -j^j*.  (3.41) 

Since  P(X)  is  positive  and  common  to  both  sides  of  the  inequality  the  decision 
rule  can  be  expressed. 

*iPi(x)  ^  n2p2(x)  (3.42) 
C  x  =  ~~T\  <  ~  3-43 

P2\X)  7Ti 

where  C(x)  is  called  the  "likelihood  ratio".  The  quantity  n2/ni  is  the  threshold 
value  of  the  decision  boundary.  Let  us  define  another  quantity  h(x)  as  follows. 

h(x)   =  -ln(£(x)) 

=    -lnp!(a;)  +  lnp2(x)  ^  In  —  (3.44) 

The  quantity  h(x)  is  called  "discriminant  function".  For  -kx  —  ir2  discriminant 
function  reduces  to 

h(x)  =  -  \api{x)  +  \np2(x)  (3.45) 
This  is  the  Bayes  test  for  minimum  error.  The  error  is  computed  by 


=   7Ti  /  pl(x)dx  +  Tv2  /  p2(x)dx 

J  L2  J  L\ 

=     IT1E1  +  7T2£2 


71 


£i  =  /  pi(x)dx  ,  e2=  p2(x)dx. 

Finally 

h(x)  =  —  \npi(x)  +  lnp2(:r) 

is  the  equation  of  the  decision  boundary  or  the  decision  hyper  surface  for  -K\  =  7r2. 
The  equation  of  the  decision  boundary  or  the  discriminant  function  is  the  solution 
of  the 

lnpi(jc)  =  lnp2(z) 

In  general,  calculation  of  decision  boundary  is  either  very  difficult  or  not  available. 
For  a  normal  distribution  Pi(x)  is  normal  with  M;  and  Ei  the  decision  rule  is 

h(x)  —  —\npi(x)  +\np2(x)  ^  In  — 

where 

Pi(3:)  =  (27r)"/2|s:|i/2  exp{-id2(x)} 
d2i(x)  =  (X-M)T^(X-M) 

1  1  1         I I 

h(x)  =  -(X-Ml)T^\X-M1)--(X-M2)T^\X-M2)  +  -lny±  % 

III       |Zj2  7T2 

For  a  normal  distribution  with  =  £2  =  E  the  decision  boundary  becomes  a 
linear  function  of  X. 

h{x)^{M2-Ml)TTl-lX  +  \(MjTl-lMl-M^-lM2)  \  In—  (3.46) 

2  V  y  7T2 

A  special  case  of  the  linear  discriminant  is  when  E  =  /,  the  identity  matrix. 

h(x)  =  (M2  -  M{f  +  liMfM,  -  MjM2)  ^  In  —  (3.47) 

2  7T2 

One  can  make  a  geometric  interpretation  of  the  above  expression  as  follows.  If  we 
add  and  subtract  the  term  \XTX  and  do  a  little  manipulation  we  get. 

h{x)  =  \\X-M1\\2-\\X-M2\\2  %  ln^  (3.48) 


72 


So  for  this  special  case,  the  decision  rule  has  the  geometric  interpretation  of  com- 
paring the  Euclidean  distance  from  X  to  M\  and  X  to  M2  and  comparing  to 
the  threshold  term.  When  7Ti  =  7T2  =  \,  the  decision  boundary  is  perpendicular 
bisector  of  the  line  joining  M\  and  M2. 

Linear  classifiers  are  the  simplest  ones  as  far  as  the  implementation  is  con- 
cerned. Linear  classifiers  are  preferred  because  they  are  simpler  to  handle  and 
more  robust  which  outweighs  their  lack  of  perfection.  However,  in  Bayes  sense, 
linear  classifiers  are  optimum  only  for  normal  distributions  with  equal  covariance 
matrices.  This  strict  requirement  makes  linear  classifiers  non-applicable  to  most 
real  world  problems.  In  order  to  overcome  this  limited  scope  of  the  linear  classifiers 
while  still  be  close  to  the  spirit  of  the  Bayes  theorem,  we  will  try  to  design  a  linear 
classifier  for  normal  distribution  with  unequal  covariance  matrices.  This  method 
is  called  Fisher  Discriminant  and  is  the  topic  of  discussion  in  the  next  section 

3.10    Fisher  Discriminants 

In  analogy  with  our  earlier  derivation,  we  want  to  find  a  linear  discriminant 
function  h(x)  which  has  the  general  form 

h(x)  =  VTX  +  v0^0.  (3.49) 

We  want  to  find  optimum  coefficients  and  the  threshold  value  v>0  for  given  distri- 
bution under  various  criteria  of  physical  interest.  There  are  a  number  of  criteria 
to  be  considered  for  optimization,  but  the  Fisher  Criterion  is  the  most  commonly 
used  measure  of  difference  between  two  classes  and  it  has  the  following  form. 

/  =  (3.50) 

It  measures  the  difference  of  the  two  means,  normalized  by  averaging  over  vari- 
ances. 


73 


V  ►! 


n 


1 


Figure  3.4:  Fisher  criterion  aims  at  maximizing  the  distance  between  the  mean  of 
the  distribution  of  two  normally  distributed  classes. 


Our  objective  is  to  find  the  projection  vector  V  and  the  threshold  term  uQ  for 
which  the  (r/i  -  r)2)  is  maximized. Now,  if  X  is  normally  distributed  then  h(x)  is 
also  normally  distributed  and  error  in  /i-space  is  determined  by  rji  and  c^. 

rji   =  E(h{x)\ui) 

=   VTE(X\cul)  +  u0 

=   VTMl  +  uQ  (3.51) 


a2   =  Var(h(x)\ui) 


VT  ((X  -  Mi)  (X  -  Mif  V 
VTXiV 


(3.52) 


Differentiating  the  Fisher  Criterion  with  respect  to  V  and  u0. 

dl  dldal      d^dal      dldrn  dldrj, 

dV         do\  OV     da\  dV     drn  dV     dr)2  dV '  1  j 


74 


Figure  3.5:  Rotated  space  in  which  the  separation  between  two  means  is 
maximized. 


df_ 
du0 


df_da\  +  df_dal  +  df_drn  ^  5/3% 


da\  dvo     da\  8vq     drji  du0     dr]2  dv$ 

Calculating  the  partial  derivatives  of  Oi  and  rji  we  get 

da?    dr]i 

dV 

dm 

du0 


dV 
da? 


dv0 


=  0, 


=  Mi, 
=  1. 


Substituting  in  3.53  and  3.54  and  setting  them  equal  to  zero  yields 


da\ 


dal 


V 


dr)i  dx]2 


df  df  n 
—  +  —  =  0. 
dm  dr]2 


(3.54) 

(3.55) 
(3.56) 

(3.57) 
(3.58) 


Note  that  the  error  on  /i-space  depends  on  the  direction  of  the  vector  V  not  its 
magnitude,  hence  dropping  the  overall  constant  and  solving  for  V  we  obtain 

V  =  [SEi  +  (1  -  5)E2]-1  (M2  -  MO 


75 


where 


df. 
do\ 

df  df 


da\  do\ 

In  calculating  the  optimum  projection  vector  V,  we  have  not  assumed  any  criteria 
yet.  So  the  result  obtained  is  the  most  general  optimum  vector  solution  for  linear 
discriminants.  In  particular,  for  Fisher  discriminant  we  have 


df  df 


(vi  -  mY 


da\     da\  '      (a\  +  a22)2 
Therefore,  for  the  Fisher  Criterion  S  =  \  and  the  optimum  V  is 


(3.59) 


V  = 

and  the  Fisher  discriminant  is 

h(x)  = 


-Ei  +  -E2 
2   1     2  2 


(M2  -  Mi) 


(Mi  -  M2)X. 


We  can  write  the  above  expression  explicitly 

i 

on   =   Y,  (Zij(sig)  +  Zijibak))'1  {Mj(sig)  -  Mj(bak)) 


(3.60) 

(3.61) 

(3.62) 
(3.63) 


Here  sig  and  bak  refer  to  signal  and  background.  The  covariances  have  the  explicit 
form  of 

E«  =  E^Xi-MfiiXj-Mj)) 


N 


^-Yi^n-M^Xjn-Mj) 
JV  n=l 


1 

iV 


N 


YiXinX^-MiMj. 


(3.64) 


n=\ 


Equation  3.62  indicates  that  Fisher  discriminants  are  analogous  to  single  layer 
and  single  unit  neural  networks[31],  and  a  coefficients  play  the  role  of  connection 


76 


weights.  Here  as  with  the  network,  one  inputs  a  set  of  Nin  variables,  Xi,  and  there 
is  one  output.  However,  in  this  case,  h(x)  is  a  linear  function  of  the  inputs,  and 
there  is  no  non-linear  activation  function.  In  this  case,  one  tries  to  identify  the 
Fisher  coefficients  using  the  sample  data  which  is  similar  to  training  the  network. 
The  difference  is  that  here  one  calculates  the  coefficients  by  inverting  an  Nin  x  Nin 
matrix  given  in  equation  3.63. 


CHAPTER  4 
ENHANCING  THE  HIGGS  BOSON  SIGNAL 

4.1    Higgs  Decay  Processes 

The  Standard  model  (SM)  in  its  minimal  form  predicts  the  existence  of  the 
Higgs  boson  as  a  manifestation  of  electro- weak  symmetry  breaking  [32,  33].  Ex- 
perimental data  accumulated  so  far  supports  the  SM  as  the  theory  of  fundamental 
interactions.  The  final  remaining  hurdle  in  its  verification  is  discovering  the  Higgs 
boson  itself.  Our  goal  is  to  use  Neural  Networks  to  detect  the  SM  Higgs  at  the 
future  Large  Hadron  Collider  (LHC)  which  will  be  built  at  Cern  in  Switzerland.  In 
particular,  we  want  to  use  Neural  Network  classifiers  as  tool  for  better  discrimina- 
tion between  signal  and  background  [34].  Since  the  Higgs  mass  is  not  predicted  by 
the  Standard  Model,  we  will  limit  ourselves  to  some  specific  cases.  More  precisely 
we  shall  analyze  the  Higgs  mass  of  400  GeV  and  study  its  ZZ  decay  channel. 

We  will  use  neural  networks  to  help  identify  the  ZZ  — >•  signal  produced 

by  the  decay  of  a  400  GeV  Higgs  boson  at  a  proton-proton  collider  energy  of  15  TeV 
from  the  "ordinary"  QCD  Z+jets  background.  We  first  consider  the  ideal  case 
where  only  one  event  at  a  time  enters  the  detector  (no  pile-up).  Next,  we  examine 
the  more  realistic  case  of  multiple  interactions  per  beam  crossing  (pile-up).  We 
will  show  that  in  both  cases,  when  used  in  conjunction  with  the  standard  cuts, 
neural  networks  provide  an  additional  signal  to  background  enhancement. 

As  we  discussed  in  chapter  3,  the  neural  networks  are  ideal  tools  for  classifica- 


77 


78 


tion  problems.  In  this  chapter  we  investigate  in  more  details  the  neural  networks 
capabilities  as  a  tool  for  high  energy  collider  phenomenology.  The  great  challenge 
at  hadron  colliders  is  to  disentangle  any  new  physics  that  may  be  present  from  the 
"ordinary"  QCD  background.  Hadron  collider  events  can  be  very  complicated  and 
quite  often  one  has  the  situation  where  the  signal  is  hiding  beneath  the  background. 
In  addition,  there  are  many  variables  that  describe  a  high  energy  collider  event  and 
it  is  not  always  obvious  which  variables  best  isolate  the  signal  or  precisely  what 
data  selection  (or  cuts)  optimally  enhance  the  signal  over  the  background.  Here 
neural  networks  are  an  excellent  tool  since  they  are  ideal  for  separating  patterns 
into  categories  (e.g.,  signal  and  background).  We  will  "train"  a  network  to  distin- 
guish between  signal  and  background  using  a  large  number  of  variables  to  describe 
each  event.  The  network  computes  a  single  variable  that  ranges  from  zero  to  one. 
If  the  training  is  successful  the  network  will  output  a  number  near  one  for  a  signal 
event  and  near  zero  for  a  background  event  and  a  single  cut  can  be  made  on  the 
network  output  which  will  enhance  the  signal  over  the  background. 

An  important  final  state  at  hadron  colliders  consists  of  a  large  transverse  mo- 
mentum charged  lepton  pair  plus  two  accompanying  jets  (i.e.,  It  is  one  of 
the  relevant  signals  for  the  production  of  a  Higgs  particle  and  its  subsequent  decay 
into  ZZ  with  one  Z  decaying  leptonically  and  the  other  Z  decaying  hadronically 
into  a  qq  pair  which  then  manifests  itself  as  a  pair  of  jets.  The  predominant  back- 
ground for  this  process  is  a  single  large  transverse  momentum  Z  bosons  plus  the 
associated  jets  mimic  the  Higgs  boson  signal.  Requiring  the  Z  boson  to  have  a  large 
transverse  momentum  by  demanding  a  large  Pt  lepton  pair  forces  the  background 
to  have  a  large  PT  "away-side"  quark  or  gluon  via  subprocesses  like  qg  — >  Zq 
or  qq  — >  Zg.  This  away-side  parton  often  fragments  via  gluon  bremsstrahlung, 
producing  away-side  jet-pairs  which  resemble  the  signal.  In  this  paper,  we  use 
neural  networks  to  help  distinguish  the  ZZ  — >•            decay  of  a  400  GeV  Higgs 


79 


boson  signal  from  the  Z+jets  background  in  proton-proton  collisions  at  15TeV. 
The  neural  network  will  be  used  in  conjunction  with  the  standard  data  cuts  to 
provide  additional  signal  to  background  enhancements.  The  discovery  mode  for  a 
Higgs  boson  of  this  mass  at  a  hadron  collider  is  the  "gold-plated"  four  lepton  de- 
cay, ZZ  ->  £+£~£+£~.  Here  we  investigate  whether  neural  networks  can  help  with 
the  "jet-physics"  of  the  £+£~jj  mode,  particularly  in  the  environment  of  multiple 
interactions  per  beam  crossing  (i.e.,  pile-up).  Also,  progress  made  here  can  be 
carried  over  to  the  WW  — >  £ujj  decay  mode  of  the  Higgs  boson  [35,  36]. 

We  will  not  try  to  give  a  detailed  simulation  of  an  experiment  at  the  LHC.  Higgs 
boson  production  at  a  15TeV  proton-proton  collider  is  used  as  an  illustration  of 
neural  networks  as  a  tool  in  high  energy  jet  phenomenology.  We  have  designed, 
constructed,  and  tested  the  networks  presented  here  from  the  beginning  with  the 
emphasis  on  high  energy  data  analysis.  We  begin  in  Section  4.2  by  discussing  event 
generation  for  the  ideal  case  where  only  one  event  at  a  time  enters  the  detector 
(no  pile-up).  In  Section  4.3,  we  discuss  data  selection  and  cuts  used  to  preprocess 
the.  In  Section  4.4,  we  discuss  the  type  of  neural  networks  and  variables  used  to 
analyze  the  data.  Application  of  the  Fisher  Discriminant  on  our  data  is  discussed 
in  In  Section  4.5.  We  discuss  the  networks  cut  off  in  section  4.6.  In  Section,  4.7 
we  examine  the  case  of  multiple  interactions  per  beam  crossing  (pile-up).  Finally, 
in  section  4.8  we  discuss  the  networks  performance. 

4.2    Event  Generation  Without  Pile-up 

We  consider  first  the  ideal  case  where  only  one  event  at  a  time  enters  the  detec- 
tor. We  want  to  determine  whether  neural  networks  can  be  trained  to  distinguish 
between  the  Higgs  boson  signal  and  the  Z+jets  background  when  there  is  no  pile- 
up.  ISAJET  version  7.06  is  used  to  generate  Higgs  bosons  with  a  mass  of  400  GeV 


so 


in  15TeV  proton-proton  collisions.  The  generated  width  of  the  Higgs  is  about 
30GeV.  The  Higgs  boson  is  forced  to  decay  into  two  Z  bosons  with  one  Z 
decaying  leptonically  and  the  other  Z  decaying  into  a  quark-antiquark  pair.  We 
refer  to  this  as  the  "signal".  The  "background"  consists  of  single  Z  boson  events 
generated  with  the  hard-scattering  transverse  momentum  of  the  Z,  kr,  greater 
than  100  GeV.  Single  Z  bosons  are  produced  at  large  transverse  momentum  via 
the  "ordinary"  QCD  subprocesses  qg  — >  Zq,  qg  ->  Zq,  and  qq  ->  Zg.  These  sub- 
processes,  of  course,  generate  addition  gluons  via  bremsstrahlung  off  both  incident 
and  outgoing  color  non-singlet  partons,  resulting  in  multiparton  final  states  which 
subsequently  fragment  into  hadrons,  and  is  referred  to  as  the  Z+jets  background. 

We  are  not  attempting  to  do  a  detailed  simulation  of  an  LHC  detector  [2,3]. 
Events  are  analyzed  by  dividing  the  solid  angle  into  "calorimeter"  cells  having  size 
Ar/A(j)  =  0.2  x  15°,  where  77  and  <f>  are  the  pseudorapidity  and  azimuthal  angle, 
respectively.  A  single  cell  has  an  energy  (the  sum  of  the  energies  of  all  the  particles 
that  hit  the  cell  excluding  neutrinos)  and  a  direction  given  by  the  coordinates  of 
the  center  of  the  cell.  From  this  the  transverse  energy  of  each  cell  is  computed  from 
the  cell  energy  and  direction.  Large  transverse  momentum  leptons  are  analyzed 
separately  and  are  not  included  when  computing  the  energy  of  a  cell.  Jets  are 
defined  using  a  simple  algorithm.  One  first  considers  the  "hot"  cells  (those  with 
transverse  energy  greater  than  5  GeV).  Cells  are  combined  to  form  a  jet  if  they 
lie  within  a  specified  "distance"  or  "radius",  R2  =  V772  +  V</>2,  in  rj-cf)  space  from 
each  other.  Jets  have  an  energy  given  by  the  sum  of  the  energy  of  each  cell  in  the 
cluster  and  a  momentum  pj  given  by  the  vector  sum  of  the  momentums  of  each 
cell.  The  invariant  mass  of  a  jet  is  simply  M2  =  E2  —  pj  •  pj. 

We  have  taken  the  energy  resolution  to  be  perfect,  which  means  that  the  only 
resolution  effects  are  caused  by  the  lack  of  spatial  resolution  due  to  the  cell  size. 
However,  we  are  using  a  very  crude  calorimeter  with  large  cells  (960  cells  with 


81 


\r]\  <  4).  Experiments  at,  for  example,  the  LHC  [3,4]  will  have  considerably 
smaller  cell  size  and  hence  better  spatial  resolution.  Even  with  the  addition  of 
energy  resolution  effects,  the  combined  spatial  and  energy  resolution  at  the  LHC 
should  be  comparable  to  or  better  than  in  our  analysis. 

4.3    Data  Selection  and  Cuts 

Our  "zero-level"  trigger  is  designed  to  select  large  transverse  momentum  Z 
bosons  that  have  decayed  into  charged  leptons.  The  first  cut  is  made  by  demanding 
that  the  event  contain  at  least  two  high  transverse  momentum  leptons  = 
or  ^)  in  the  central  region  as  follows: 

•  PT{P)  >  25GeV,  M^)!  <  2.5. 

Lepton  pairs  (e+e~  and  ^+^~)  are  constructed  for  the  events  that  survive 
this  first  cut.  The  pairs  are  ordered  according  to  their  invariant  mass,  with 
pair  #1  having  the  mass  closest  to  the  Z  boson  and  pair  #2  being  the  second 
closest,  etc.  .  Finally,  the  event  is  rejected  unless  at  least  one  lepton  pair 
satisfies  the  following: 

•  PT(£+r)  >  100  GeV. 

Table  4.1  shows  that  for  a  400  GeV  Higgs  at  15TeV,  roughly  10,000  events 
per  year  pass  this  "zero  level  "  trigger.  Here  the  integrated  luminosity  for 
one  year  is  taken  to  be  the  expected  LHC  value  of  105/pb.  About  2  million 
background  events  per  year  survive  this  "zero  level  "  lepton  cut. 

This  high  transverse  lepton  pair  cut  is,  of  course,  crucial.  The  transverse  mo- 
mentum spectrum  of  the  single  Z  QCD  background  falls  off  rapidly,  while  for  the 
heavy  Higgs  the  signal  is  peaked  at  about  half  the  mass  of  the  Higgs.  Here  one 
wants  to  take  as  large  of  a  cut  on  PT(£+£~)  as  possible  without  loosing  too  much 


82 


of  the  signal.  However,  even  with  this  cut,  the  background  is  still  more  than  200 
times  the  signal! 

The  jet  topology  of  events  with  at  least  one  large  transverse  momentum  lepton 
pair  is  analyzed  by  first  examining  only  jet  cores  (i.e.,  narrow  jets  of  size  i?j(core)). 
Here  one  includes  only  those  jet  cores  satisfying, 

•  £r(jetcore)  >  25GeV    ,    |77(jetcore)|  <  3, 
with 

•  ^(core)  =  0.2. 

In  an  attempt  to  find  the  two  jets  produced  by  the  hadronic  decay  of  the  large 
transverse  momentum  Z  boson,  jet  pairs  are  formed  by  demanding  that  the 
distance  between  the  two  jet  cores  in  rj-<f>  space,  djj  =  (rji  —  772)2  +  {4>i  —  (fo)2, 
be  less  than  1.6.  Namely, 

•  djj(jet-jet  cores)<  1.6. 

In  addition,  the  jet-jet  cores  are  required  to  satisfy 

•  >  100  GeV,       -       >  90°, 

where  Pj?  is  the  total  transverse  momentum  of  the  core  jet-pair  and  <f>jj  —  <j>a 
is  the  azimuthal  angle  between  the  leading  lepton  pair  and  the  core  jet-pair. 
The  jet-pair  is  required  to  be  in  the  opposite  hemisphere  (or  "away-side") 
from  the  lepton  pair.  If  more  than  one  jet-pair  meets  all  of  these  requirements 
than  the  pair  with  the  largest  total  transverse  energy  is  selected. 

Table  4.1  shows  that  of  the  10,  000  signal  events  passing  the  "zero  level"  lepton 
trigger  about  49%  also  pass  the  jet-pair  selection  criterion.  Unfortunately,  about 
30%  of  the  ordinary  Z+jets  background  events  that  survive  the  "zero  level"  lepton 
trigger  also  have  a  jet-pair  meeting  the  selection  criteria. 


83 


Figure  4.1:  Shows  the  away-side  jet-jet  mass  for  a  400  GeV  higgs  boson  produced 
in  15  TeV  p-p  collisions.  The  plot  corresponds  to  the  number  of  events  per  year 
(Lum=  105/pb)  in  a  10  GeV  bin  for  the  H  ->  ZZ  signal  and  the  Z+jets  back- 
ground. The  ideal  case  where  only  one  event  at  a  time  enters  the  detector  (no 
pile-up)  and  the  case  of  multiple  interactions  per  beam  crossing  (with  pile-up)  are 
shown.  In  all  cases  the  events  have  survived  the  "zero-level"  lepton  trigger  and 
the  jet-pair  selection  criterion. 


Here  it  is  useful  to  define  two  quantities  that  measure  the  effectiveness  of  a 
particular  cut.  The  "enhancement  factor"  is  defined  as  the  percentage  of  signal 
divided  by  the  percentage  of  background  that  survives  the  cut.  Namely, 

%  of  signal  surviving  cut 
enh     %  of  background  survivng  cut 

The  efficiency  of  a  cut  is  defined  as  the  percentage  of  signal  that  survives  the  cut, 

Fejf  =  %  of  signal  surviving  cut. 

The  jet-pair  selection  criterion  results  in  an  enhancement  of  1.6  with  an  efficiency 
of  about  49%.  The  "zero  level"  lepton  trigger  is  used  as  a  reference  point  and 
is  normalized  to  an  efficiency  of  100%  and  an  enhancement  of  one.  One  might 


84 


have  expected  to  do  better  at  this  stage.  However,  once  we  require  that  the  Z 
boson  have  a  large  transverse  momentum,  we  force  the  background  to  have  a 
large  Pt  away-side  quark  or  gluon  jet.  This  away-side  parton  often  fragments  via 
gluon  bremsstrahlung  into  multiple  away-side  jets  which  then  survive  the  selection 
criteria. 


Selection 

Signal 
H  ZZ 

Background 
Z+  jets 

Back/ 

Enhancement 
Factor 

Cut 

% 
Overall 

Events/ 
year 

% 

Overall 

Events/ 
year 

Sig 

Relative 

Overall 

Lepton  trigger 
PT(I)  >  25  Gev 
PT(II)  >  100  Gev 

100% 

10185 

100% 

1961818 

193 

1.0 

1.0 

Jet  pair  selection 
ET(j)  >  25  Gev 
PT(jj)  >  100  Gev 

49.0% 

4995 

30.4% 

595622 

119 

1.6 

1.6 

Z-mass  cut 
81  <  Mz  <  101  Gev 

25.0% 

2551 

2.3% 

44244 

17 

6.9 

11.1 

//-mass  cut 
350  <MH  <  450  Gev 

22.0% 

2241 

0.7% 

14471 

6.5 

2.7 

29.8 

Z-mass  k.  net  cut 
81  <  Mz  <  101  Gev 
net  cut  >  0.75 

10.4% 

1060 

0.2% 

3683 

3.5 

5.0 

55.4 

//-mass  &  net  cut 
350  <  MH  <  450  Gev 
net  cut  >  0.75 

9.4% 

954 

0.1% 

1862 

2.0 

3.3 

98.7 

Table  4.1:  400  GeV  Higgs  boson  produced  in  15  TeV  p-p  collisions.  The  table 
shows  the  number  of  events  per  year  (with  Lum=105/pb)  for  the  H  — >  ZZ  signal 
and  Z+jets  background  for  the  ideal  case  where  only  one  event  at  a  time  enters  the 
detector  (i.e.,  no  pile-up).  The  "zero-level"  lepton  trigger  is  used  as  a  reference 
point  and  is  normalized  to  100%.  The  enhancement  factor  is  defined  to  be  the 
percentage  of  signal  divided  by  the  percentage  of  background  surviving  the  given 
set  of  cuts. 


The  invariant  mass,  Mj^full),  is  constructed  by  using  all  cells  that  lie  within 
a  "distance"  ^  (full)  in  r)-(f)  space  of  either  of  the  two  jets.  Cells  are  not  double 
counted.  For  example,  a  cell  may  lie  within  ^  (full)  of  both  jets,  nevertheless  it 
is  counted  just  once.  The  aim  here  is,  of  course  to  reconstruct  the  invariant  mass 
of  the  Z  boson  as  shown  in  Figure  4.1.  However,  this  full  jet-jet  invariant  mass 


85 


Events  per  year 

in  25  GeV  bin 
800 


Reconstructed  Higgs  Mass 


400  GeV  Higgs  in  1 5  TeV  pp  collisions 
81  <  Mjj  <  1 01  GeV  no  pile-up 
Network  weighted 


300        350  400 
Mass  (GeV) 


■  Higgs->ZZ  Signal  -a—  1*  Jets  Background 


Figure  4.2:  Shows  the  reconstructed  mass  of  a  400  GeV  higgs  boson  produced  in  15 
TeV  p-p  collisions.  The  plot  corresponds  to  the  number  of  events  per  year  (Lum= 
105/pb)  in  a  25  GeV  bin  for  the  H  -»  ZZ  signal  and  the  Z+jets  background  for 
the  ideal  case  where  only  one  event  at  a  time  enters  the  detector  (no  pile-up). 
The  events  have  survived  the  "zero-level"  lepton  trigger  and  the  jet-pair  selection 
criterion  with  81  <  Mjj  (full)  <  101  GeV.  No  network  cut  has  been  made. 


will  only  be  used  in  the  event  selection.  The  Higgs  mass  will  be  reconstructed  by 
setting  Mjj  =  Mz-  At  this  stage,  events  are  rejected  unless  the  full  jet-jet  mass 
satisfies: 

•  81<  Mjj  (full)  <  101,  with 

•  #jj(full)  =  0.6. 

As  can  be  seen  in  Figure  4.1  and  Table  4.1,  about  51%  of  the  Higgs  signal  passing 
both  the  lepton  cut  and  the  jet-pair  selection  have  Mjj  within  lOMeV  of  the 
Z  boson  mass.  On  the  other  hand,  only  about  7%  of  the  Z+jets  background 
events  surviving  both  the  lepton  cut  and  the  jet-pair  selection  have  a  full  jet-pair 
invariant  mass  within  10  MeV  of  the  Z  boson  mass.  This  corresponds  to  an  overall 


86 


enhancement  factor  at  this  stage  of  about  11  with  an  overall  efficiency  of  about 
25%.  The  background  lies  well  above  the  signal  in  Figure  4.1  so  that  one  cannot 
directly  see  the  Z  mass  peak.  Nevertheless,  the  jet-jet  invariant  mass  cut  is  very 
important. 

The  Higgs  invariant  mass  is  constructed  from  the  momentum  vectors  of  the 
two  charged  leptons  and  the  momentum  vector  of  the  jet-pair  as  follows: 

M2  =  (Ee+  +  Ee-  +  Ejjf  -  {pe+  +  Pi-  +  P33f  , 

where 

The  mass  of  a  jet  is  not  a  well  defined  quantity  since  it  depends  on  the  soft  particles. 
The  momentum  vector  of  a  jet  is  better  defined  and  is  determined  primarily  by 
the  core  cells.  Thus,  in  constructing  the  Higgs  mass  we  use  the  momentum  vector 
of  the  jet-pair  but  not  the  jet-pair  mass.  The  mass  of  the  jet-pair  is  set  equal  to 
the  mass  of  the  Z  boson. 

Figure  4.2  shows  the  reconstructed  Higgs  mass  for  both  the  signal  and  back- 
ground events  that  have  passed  the  lepton  cuts,  the  jet-pair  selection,  and  have 
81  <  Mjj  (full)  <  101  GeV.  At  this  stage,  there  are  about  2, 000  Higgs  boson  events 
and  14, 000  QCD  background  events  per  year  within  50  GeV  of  the  true  Higgs  mass 
of  400  GeV.  This  corresponds  to  an  overall  enhancement  factor  of  about  30  (see 
Table  4.1  )  with  an  overall  efficiency  of  about  22%.  However,  even  with  this  en- 
hancement the  Z+jets  background  is  still  more  than  6  times  the  signal.  It  is  at 
this  stage  that  neural  networks  will  be  used  to  provide  an  additional  enhancement 
of  signal  over  background. 


87 


4.4    Network  Analysis  Without  Pile-up 

Recall,  the  neural  networks  can  be  thought  of  as  mapping  function  with  a  set 
of  Nin  inputs,  {x},  which  can  have  any  value  and  one  or  more  outputs.  In  our  case 
we  consider  a  network  with  a  single  output,  znet,  which  is  restricted  to  the  range, 
0  <  znet  <  1-  The  net  output  is  a  function  of  the  input  set  {x}  and  the  network 
"memory"  parameters  as  follows: 

Znet  =  Fnet({x},  {w}), 

The  goal  is  to  train  a  network  that  can  distinguish  between  two  patterns  of  input 
data,  "signal"  events  and  "background"  events,  where  each  event  is  characterized 
by  the  Nin  variables.  A  "perfect"  network  responds  with  znet  near  one  for  a  signal 
input  and  with  znet  near  zero  for  a  background  input.  The  networks  we  will  be 
using  are  far  from  perfect  and  the  net  outputs  will  vary  from  zero  to  one  for  both 
the  signal  and  the  background  events.  To  characterize  the  performance  of  the 
network  on  a  sample  of  Nsig  signal  events  and  A^  background  events  we  define 
the  network  "error  function"  as  follows: 

1      N*ig  1  Nbak 

Xlt  =         E  M")  '  !)2  +  Jj-  E  -  0)2, 

iV«9  n=l  •/v6afc  n=i 

where  znet(n)  in  the  first  and  second  summation  is  the  network  response  for  the  nth 
signal  and  background  event,  respectively.  This  quadratic  error  function  ranges 
from  zero  to  one.  It  is  equal  to  zero  for  a  "perfect"  network  and  is  equal  to  0.25 
for  a  network  that  responds  with  znet  =  0.5  for  both  signal  and  background  (i.e., 
a  "dumb"  network). 

We  will  train  a  neural  network  to  distinguish  between  the  signal  and  background 
events  that  have  already  passed  the  lepton  cuts,  the  jet-pair  selection,  and  have 
81  <  Mjj  (full)  <  101  GeV.  These  important  cuts  are  made  before  sending  the 


88 


1 6%  j 

14%  -■ 

12%  -- 

c 

10%  -- 

> 

8%  -• 

UJ 

6%  -■ 

4%  -- 

2%  -■ 

0%  -■ 

Jet  Multiplicity 


400  GeV  Higgs  in  1 5  TeV  pp  collisions 
81  <  Mjj  <  1 01  GeV   no  pile-up 


0         2         4  6 
Number  of  Jets  with  ET  >  5  GeV 


I  Higgs->ZZ  Signal  o  Z+ Jets  Background 


Figure  4.3:  Shows  the  multiplicity  of  jets  for  400  GeV  Higgs  bosons  produced  in  15 
TeV  p-p  collisions.  The  plot  corresponds  to  the  percentage  of  events  with  iV  jets 
with  ET  greater  than  5  GeV  for  the  H  — >  ZZ  signal  and  the  Z+jets  background 
for  the  ideal  case  where  only  one  event  at  a  time  enters  the  detector  (no  pile-up). 
The  events  have  survived  the  "zero-level"  lepton  trigger  and  the  jet-pair  selection 
criterion  with  81  <  M^full)  <  101  GeV. 


events  to  the  network.  Even  though  both  the  signal  and  background  events  have 
survived  these  cuts,  there  is  still  additional  information  in  the  events  that  is  not 
the  same  for  the  signal  and  the  background.  The  network  can  use  these  differences 
to  further  help  distinguish  signal  from  background. 

Of  course,  the  key  to  a  good  network  lies  in  the  selection  of  the  input  vari- 
ables. These  variables  must  characterize  the  differences  between  the  signal  and 
the  background.  In  this  analysis  we  choose  the  following  nine  input  variables: 


x  i    —   djj , 

x2   =   4(1)  -4(2)|/(4(1)  -4(2)), 


89 


^3 

=  jet(£T>5GeV), 

■''  1 

=  riRjj  <  0.2) /EriRjj  <  1.0), 

X5 

=  r(0.2  <  Rn  <  0.6) /ET{Rjj  <  1.0), 

*    V                           J  J                      ft         *    >       J  J                      '  ' 

=  T(0.6  <  %  <  1.0) /EriRjj  <  1.0), 

x7 

=   (i?^  <  0.2)/M{Rjj  <  1.0), 

=   (0.2  <  %  <  0.0)/M{Rjj  <  1.0), 

X9 

=   (0.6  <  i?„  <  1.0)/M(%  <  1.0), 

The  first  variable  is  simply  the  distance  in  -q-cj)  space  between  the  two  "away- 
side"  jets  selected  in  the  jet-pair  selection.  For  the  signal  this  is  related  to  the 
opening  angle  of  the  quark-antiquark  pair  resulting  from  the  Z  — >•  qq  decay,  while 
for  the  background  this  is  the  distance  between,  for  example,  an  outgoing  quark 
and  the  radiated  gluon  jet.  The  second  variable  is  the  "skewness"  of  the  transverse 
energies  of  the  two  jets  cores,  while  the  third  variable  is  simply  the  overall  number 
of  jets  (with  ET  >  5  GeV)  in  the  event  and  is  shown  in  Figure  4.3 

The  remaining  variables  depict  the  precise  manner  in  which  transverse  energy 
and  mass  are  distributed  around  the  away-side  jet-pair.  For  example,  is  the 
ratio  of  the  amount  of  transverse  energy  coming  from  calorimeter  cells  within  the 
"halo"  region  0.6  <  Rjj  <  1.0  surrounding  both  jets  to  the  total  transverse  energy 
of  the  extended  jet-pair  (^  (extended)  =  1.0).  As  can  be  seen  in  Figure  4.4,  the 
fraction  of  transverse  energy  in  this  region  is,  on  the  average,  slightly  larger  for  the 
background  than  for  the  signal.  Similarly,  Xg  is  fraction  of  the  full  jet-jet  invariant 
mass  that  comes  from  calorimeter  cells  in  the  "halo"  region  0.6  <  R3J  <  1.0. 
Figure  4.5  shows  that  more  of  the  extended  jet-jet  mass  lies  in  this  region  for  the 
background  than  for  the  signal.  The  other  halo  regions  also  show  slight  variations 
between  signal  and  background  which  the  network  can  use  to  help  distinguish 


90 


Transverse  Energy  Fraction 


40%  T 


c 


35%"    §  400  GeVHiggs  in  15 TeVpp  collisions 

30%--    §  81  <  Mjj  <  1 01  GeV    no  pile-up 


CD 

> 

UJ 

J? 


q%  fa  ifa  ipa  ipsa  ifsa  p  ip  ipj  i^j  ir-r-'rr~1i-j7iM-r'i  -i  -i — i — t — i — t- 

0.0125  0.0625  0.1125  0.1625  0.2125  0.2625  0.3125  0.3625  0.4125  0.4625 


ET(0.6  <  R  <1 .0)  /  ET(R<1 .0)  s  Higgs->ZZ  Signal  o  Z+Jets  Background 


Figure  4.4:  Shows  the  fraction  of  transverse  energy  coming  from  calorimeter  cells 
within  the  "halo"  region  0.6  <  Rjj  <  1.0  surrounding  either  of  the  away-side  jets. 
The  plot  corresponds  to  the  percentage  of  events  with  the  jet-jet  transverse  energy 
fraction  within  the  0.025  bin  for  the  H  — >  ZZ  signal  and  the  Z+jets  background 
for  the  ideal  case  where  only  one  event  at  a  time  enters  the  detector  (no  pile-up). 
The  events  have  survived  the  "zero-level"  lepton  trigger  and  the  jet-pair  selection 
criterion  and  have  81  <  M^full)  <  101  GeV. 


between  the  two. 

The  idea  here  is  similar  to  the  jet-jet  profile  analysis  we  presented  in  ??.  For 
the  signal,  the  away-side  jet-pair  arises  from  the  qq  decay  of  a  large  transverse 
momentum  Z  boson.  The  Z  boson  is  a  color  singlet  and  does  not  radiate  gluons 
during  flight.  On  the  other  hand,  the  large  Pt  away-side  recoil  quarks  or  gluons  in 
the  single  Z  background  are  not  color  singlets  and  produce  additional  gluons  via 
bremsstrahlung.  These  radiated  gluons  deposit  transverse  energy  around  the  jet- 
jet  cores.  This  results  in  more  transverse  energy  and  invariant  mass  surrounding 
the  jet-jet  cores  for  the  Z+jets  background  than  for  the  Higgs  boson  signal.  The 
distribution  of  transverse  energy  and  invariant  mass  around  the  "away-side"  jet- 


91 


e 

£ 

m 
o 


> 

LU 
i? 


Mass  Fraction 


400  GeV  Higgs  in  1 5  TeV  pp  collisions 
81  <  Mjj  <  1 01  GeV    no  pile-up 


l| — I  \  i  i| 


0.025       0  125  0.225 
M(0.B  <  R  <  1 .0)  /  M(R<1 .0) 


0.325 


0.425 


0.525 


0.625 


iHiggs->ZZ  Signal  nZ+Jets  Background 


Figure  4.5:  Shows  the  fraction  of  invariant  mass  coming  from  calorimeter  cells 
within  the  "halo"  region  0.6  <  Rjj  <  1.0  surrounding  either  of  the  away-side  jets. 
The  plot  corresponds  to  the  percentage  of  events  with  the  jet-jet  invariant  mass 
fraction  within  the  0.05  bin  for  the  H  — >  ZZ  signal  and  the  Z+jets  background 
for  the  ideal  case  where  only  one  event  at  a  time  enters  the  detector  (no  pile-up). 
The  events  have  survived  the  "zero-level"  lepton  trigger  and  the  jet-pair  selection 
criterion  and  have  81  <  Mjj  (full)  <  101  GeV. 


pair  is  slightly  different  in  the  two  cases. 

The  network  is  trained  on  a  sample  of  8, 348  signal  and  7, 254  background 
events  using  the  nine  inputs  shown  above  and  where  both  signal  and  background 
events  have  already  satisfied  the  lepton  cuts,  the  jet-pair  selection,  and  have  81  < 
Mjj(full)  <  101  GeV.  To  get  this  training  sample,  it  was  necessary  to  generate 
80, 000  Higgs  boson  events  and  800, 000  Z+jet  events.  We  experimented  with  a 
variety  of  network  sizes  and  types  and  present  here  the  results  from  a  9-16-8-1  net 
which  has  305  memory  parameters.  After  a  lengthy  training  process  we  achieved 
Xnet  =  0.1678  on  the  training  sample. 

Figure  4.6  shows  the  network  response  (i.e.,  znet)  for  the  sample  of  signal  and 


92 


30%  t 


Network  Response 


Net -  9-1 6-8-1  (305) 

400  GeV  Higgs  in  1 5  TeV  pp  collisions 
81  <  Mjj  <  1 01  GeV    no  pile-up 


Network  Output 


□  Sig  (training  sample) 
■  Bak  (training  sample) 


E3  Sig  (independent  sample 
□  Bak  (independent  sample) 


Figure  4.6:  Shows  the  network  response,  znet,  for  the  sample  of  signal  and  back- 
ground events  used  in  the  training  and  for  an  independent  sample  of  signal  and 
background  events.  The  plot  corresponds  to  the  percentage  of  events  with  z  —  net 
within  a  0.05  bin  for  the  H  — >  ZZ  signal  and  the  Z+jets  background  for  the  ideal 
case  where  only  one  event  at  a  time  enters  the  detector  (no  pile-up).  The  events 
have  survived  the  "zero-level"  lepton  trigger  and  the  jet-pair  selection  criterion 
and  have  81  <  M^full)  <  101  GeV. 


background  events  used  in  the  training.  The  situation  is  far  from  the  ideal.  There 
are  some  events  around  znet  =  0.5  for  which  the  net  cannot  distinguish  between 
signal  and  background.  Nevertheless,  the  net  does  allow  for  some  separation  of 
signal  and  background.  The  net  clearly  recognizes  some  events  as  signal  or  back- 
ground, while  for  other  events  there  is  an  overlap  and  the  net  cannot  distinguish 
between  the  two.  Ideally  one  would  like  a  clean  separation  between  the  signal 
and  background  in  Figure  4.6.  One  would  then  perform  a  network  cut-off  and 
assign  any  event  with  znet  >  zcut  to  be  signal  and  events  with  znet  <  zcut  to  be 
background. 

Figure  4.6  also  shows  the  network  response  (i.e.,  znet)  for  an  independent  sample 


93 


of  signal  and  background  events  not  used  in  the  training.  If  the  network  generalized 
perfectly  there  would  be  no  difference  between  the  response  of  the  network  for  the 
independent  and  the  training  samples.  The  small  differences  seen  in  Figure  4.5 
reflect  that  fact  that  we  have  trained  the  net  on  a  relatively  small  sample  of  events. 
We  could  improve  the  ability  of  the  network  to  generalize  by  starting  with  a  larger 
training  sample,  but  this  result  is  sufficient  for  what  we  want  to  illustrate  in  this 
paper. 

The  enhancement  and  efficiency  of  the  network  cut-off  depends  on  the  value 
chosen  for  zcut,  where  the  network  enhancement  and  efficiency  are  defined  as  fol- 
lows: 


%  of  signal  with  znet  >  zcut 


%  of  background  with  znet  >  zcut 
F?j}   =   %  of  signal  with  znet  >  zcut  (4.3) 

The  overall  network  performance  can  be  characterized  by  the  single  curve  of 
the  network  enhancement  versus  the  network  efficiency  shown  in  Figure  4.6.  Each 
point  in  Figure  4.7  corresponds  to  a  different  choice  for  the  network  cut-off  with  the 
lower  efficiencies  and  higher  enhancements  corresponding  to  larger  values  of  zcut. 
In  the  analysis  presented  here,  we  choose  zcut  =  0.75  which  for  the  training  sample 
corresponds  to  a  relative  efficiency  of  about  42%  with  a  relative  enhancement  of 
about  6. 

4.5    Fisher  Discriminates 

To  see  if  we  have  gained  very  much  by  using  a  neural  network,  we  compare 
our  network  response  with  the  simpler  method  of  Fisher  discriminates.  As  it  was 
discussed  in  chapter  3  Fisher  discriminant  can  be  used  for  separating  events  into 
two  classes;  signal  and  background.  Here  as  with  the  network,  one  inputs  a  set  of 


94 


Figure  4.7:  Shows  the  enhancement  versus  the  efficiency  for  the  training  sample 
of  events  for  the  9-16-8-1  neural  network  with  305  memory  parameters.  Both  the 
ideal  case  where  only  one  event  at  a  time  enters  the  detector  (no  pile-up)  and  for 
the  case  of  multiple  interaction  per  beam  crossing  (pile-up)  are  shown.  Each  point 
in  the  plot  corresponds  to  a  different  choice  for  the  network  cut-off  with  the  lower 
efficiency  and  higher  enhancements  corresponding  to  larger  values  of  zcut.  The 
network  enhancements  are  compared  with  the  enhancements  arrived  at  by  the  use 
of  Fisher  discriminates  (no  pile-up). 


Nin  variables,  xi}  and  there  is  one  output,  F.  However,  in  this  case  F  is  a  linear 
function  of  the  inputs, 

F  =  ®iXh 
i=i 

where  the  Fisher  coefficients,  a,  are  chosen  to  maximize  the  separation  between 
signal  and  background  in  F-space.  The  explicit  form  of  the  Fisher  coefficients  are 
given  in  chapter  3. 

In  this  case,  training  consists  of  calculating  the  Fisher  coefficients.  Once  this  is 
done  the  situation  is  similar  to  the  network.  For  each  input  of  7Vj„  variables  there  is 
one  output  F.  We  have  determined  the  Fisher  coefficients  for  the  sample  of  signal 


95 


c 

m 

LA 


tft 
C 

LU 
J? 


20% 
18% 
16% 
14% 
12% 
10% 
8% 
6% 
4% 
2% 
0% 


Fisher  Response 


400  GeV  Higgs  in  1 5  TeV  pp  collisions 


n  ruTl 


— I  r-.,      n(  i  i|-J  H 
0       0.5       1  1.5 
Fisher  Output 


=Pa-t — i — i 


4.5 


iHiggs->ZZ  Signal  QZ+Jets  Background 


Figure  4.8:  Shows  the  Fisher  response,  F,  for  the  sample  of  signal  and  background 
events  used  in  the  training  of  the  neural  network.  The  plot  corresponds  to  the 
percentage  of  events  with  F  within  a  0.3  bin  for  the  H  — >  ZZ  signal  and  the 
Z+jets  background  for  the  ideal  case  where  only  one  event  at  a  time  enters  the 
detector  (no  pile-up).  The  events  have  survived  the  "zero-level"  lepton  trigger  and 
the  jet-pair  selection  criterion  and  have  81  <  Mjj(full)  <  101  GeV. 


and  background  events  used  to  train  our  network  and  the  Fisher  response  for  these 
events  is  shown  in  Figure  4.8.  The  separation  between  signal  and  background  is 
not  as  good  as  with  the  network. 

As  with  the  network,  the  overall  Fisher  performance  can  be  characterized  by  the 
single  curve  of  the  Fisher  enhancement  versus  the  Fisher  efficiency  which  is  shown 
in  Figure  4.7  together  with  the  network  performance.  Each  point  corresponds 
to  a  different  choice  for  the  Fisher  cut-off.  The  best  that  can  be  done  with  the 
Fisher  method  is  an  enhancement  of  about  2,  whereas  the  network  enhancements 
are  much  higher.  However,  since  the  Fisher  coefficients  are  so  easy  to  compute 
compared  to  the  network,  it  is  useful  to  compare  to  the  Fisher  performance.  In 
this  way  we  can  judge  how  much  is  gained  by  the  added  sophistication  of  the  neural 


96 


Events  per  year 
in  25  GeV  bin 
500  T 
450 


Reconstructed  Higgs  Mass 


collisions 


Figure  4.9:  Shows  the  reconstructed  mass  of  a  400  GeV  Higgs  boson  produced  in 
15  TeV  p-p  collisions.  The  plot  corresponds  to  the  number  of  events  per  year  (with 
Lum=105/pb)  in  a  25  GeV  bin  for  the  H  —>  ZZ  signal  and  the  Z+jets  background 
for  the  ideal  case  where  only  one  event  at  a  time  enters  the  detector  (no  pile-up). 
The  events  have  survived  the  "zero-level"  lepton  trigger  and  the  jet-pair  selection 
criterion  with  81  <  M^full)  <  101  GeV  and  have  passed  the  network  cut-off  (i.e., 
have  znet  >  0.75). 


network. 

4.6    Network  Cut-off 

We  now  analyze  an  "independent"  sample  of  events  using  the  trained  network 
as  a  tool  to  help  distinguish  between  signal  and  background.  Figure  4.9  shows 
the  reconstructed  Higgs  mass  for  both  the  signal  and  background  events  that  have 
passed  the  "zero  level"  lepton  trigger,  the  jet-pair  selection  with  81  <  Mjj(full)  < 
101  GeV,  and  the  network  cut-off  (with  zcut  =  0.75).  Now,  there  are  about  1,000 
Higgs  events  and  2,  000  QCD  background  events  per  year  within  50  GeV  of  the 
true  Higgs  mass  of  400  GeV.  This  corresponds  to  an  overall  enhancement  factor  of 


97 


about  100  (see  Table  4.1)  with  an  overall  efficiency  of  about  10%.  shows  that  the 
signal  and  background  are  now  comparable.  Comparing  the  reconstructed  Higgs 
boson  mass  in  Figure  4.2  with  Figure  4.9  shows  the  added  enhancement  the  neural 
network  provides. 

An  alternative  approach  to  using  the  network  cut-off  is  to  use  network  weight- 
ing. Here  one  weights  the  event  with  the  network  response,  znet,  which  lies  between 
zero  and  one.  If  the  network  has  been  able  to  separate  signal  from  background 
then  signal  events  will  be  assigned  a  weight  near  one  and  background  events  will 
be  assigned  a  weight  near  zero. 

Figure  4.10  shown  the  network  weighted  reconstructed  Higgs  mass  for  both 
the  signal  and  background  events  that  have  passed  the  lepton  cuts,  the  jet-pair 
selection  with  81  <  Mjj  (full)  <  101  GeV.  The  advantage  here  is  that  all  the  signal 
events  are  used  (i.e.,  the  relative  efficiency  is  100%),  but  in  this  case  the  network 
cut-off  procedure  provides  a  better  enhancement  of  the  signal. 

4.7    Event  Generation  and  Cuts  With  Pile-up 

We  now  consider  the  case  of  multiple  interactions  per  beam  crossing.  ISAJET 
is  used  to  generate  Npue  minimum  bias  events  along  with  each  Higgs— >  ZZ  signal 
and  each  Z+jets  background  event.  The  number  of  pile-up  interactions  per  beam 
crossing,  Npile,  that  enter  the  calorimeter  is  generated  according  to  a  Poisson  dis- 
tribution with  a  mean  of  about  29  minimum  bias  collisions  for  each  Higgs  boson 
or  Z+jets  event  as  shown  in  Figure  4.11.  The  mean  of  29  collisions  per  beam 
crossing  was  arrived  at  by  using  a  bunch  crossing  time  of  25  ns,  a  peak  luminosity 
of  1034cm~2sec_1,  and  the  ISAJET  minimum  bias  cross  section  at  15TeV  of  116 
mb.  Our  mean  number  is  slightly  larger  than  the  20  collisions  per  beam  crossing 
quoted  for  the  LHC. 


98 


Events  per  year 
in25GeVbin 
800  t 


Reconstructed  Higgs  Mass 


400  GeV  Higgs  in  1 5  TeV  pp  collisions 
81  <  Mjj  <  1 0 1  GeV  no  pile-up 
Network  weighted 


300        350  400 
Mass  (GeV) 


70C 


•Higgs->ZZ  Signal  — a—  Z+Jets  Background 


Figure  4.10:  Shows  the  reconstructed  mass  of  a  400  GeV  Higgs  boson  produced  in 
15  TeV  p-p  collisions  weighted  by  the  network  output,  znet.  The  plot  corresponds 
to  the  weighted  number  of  events  per  year  (with  Lum=105/pb)  in  a  25  GeV  bin  for 
the  H  — ►  ZZ  signal  and  the  Z+jets  background  for  the  ideal  case  where  only  one 
event  at  a  time  enters  the  detector  (no  pile-up).  The  events  have  survived  the  "zero- 
level"  lepton  trigger  and  the  jet-pair  selection  criterion  with  81  <  Mjj  (full)  <  101 
GeV. 


These  pile-up  interactions  greatly  increase  the  particle  multiplicity  and  the 
global  transverse  energy  of  each  event.  Nevertheless,  they  do  not  affect  the  lepton 
trigger.  Table  2  shows  that,  as  before,  roughly  10, 000  Higgs  boson  and  about  2 
million  background  events  per  year  pass  the  "zero  level"  lepton  trigger. 

Events  are  again  analyzed  by  dividing  the  solid  angle  into  "calorimeter"  cells 
having  size  ArfAcf)  =  0.2  x  15°,  but  in  this  case  we  ignore  all  cells  with  ET  <  1  GeV. 
This  is  done  to  reduce  the  number  of  non-zero  cells  which  saves  time  and  improves 
the  jet  algorithm.  Jets  are  defined  as  before,  but  the  definition  of  a  "hot"  cells  is 
changed  to  10  GeV.  This  means  that  the  minimum  jet  transverse  energy  is  now 
10  GeV  (compared  to  5  GeV  in  the  analysis  without  pile-up). 


99 


Generated  Number  of  Interactions 


2%  -■ 

0%  -I  1  1  1  1 

20  25  30  35  40 

Number  of  Interactions  per  Beam  Crossing 


Figure  4.11:  Generated  number  of  minimum  bias  interactions  per  beam  crossing, 
these  events  enter  the  calorimeter  together  with  one  H  — >  ZZ  signal  event  or  one 
Z+jets  background  event  to  simulate  the  case  of  multiple  interactions  per  beam 
crossing  (pile-up). 

Except  for  these  changes,  the  jet-pair  selection  is  done  as  before  with  similar 
results.  Table  2  shows  that  of  the  10,  000  signal  events  passing  the  "zero  level" 
lepton  cut  about  50%  also  pass  the  jet-pair  selection  criterion.  Also,  about  30%  of 
the  ordinary  Z+jets  background  events  that  survive  the  "zero  level"  lepton  trigger 
have  a  jet-pair  that  meets  the  selection  criterion. 

The  jet-jet  invariant  mass  for  the  signal  and  background  events  that  have  passed 
the  "zero-level"  lepton  trigger  and  the  jet-pair  selection  criterion  is  shown  in  Figure 
4.1.  Comparison  with  the  no  pile-up  case  shows  that  the  Z  mass  peak  has  shifted 
up  about  20  GeV  and  become  somewhat  broader.  This  is,  of  course,  due  to  the 
pile-up  interactions  which  have  contributed  transverse  energy  and  mass  to  the  jet- 
pair.  Rather  than  trying  to  subtract  out  this  effect,  we  simply  shift  our  jet-jet 
mass  cut  to 


100 


Selection 

Signal 
H  ->  ZZ 

Background 
Z+  jets 

Back/ 

Enhancement 
Factor 

Cut 

% 

Overall 

Events/ 
year 

% 

Overall 

Events/ 
year 

Sig 

Relative 

vy  Vcl  'ill 

Lepton  trigger 
r  X  \l  )  >  zo  oev 
FT(/7)  >  100  Gev 

100% 

10212 

100% 

1973919 

193 

1.0 

1.0 

Jet  pair  selection 
£/ 1  \j )  >  oev 
PT{jj)  >  100  Gev 

53.3% 

5440 

33.6% 

662850 

122 

1.6 

1.6 

Z-mass  cut 

0 1    \  IV  1  %    V.    1U1    VI"  V 

19.3% 

1973 

2.3% 

44693 

23 

5.4 

8.5 

//-mass  cut 
350  <  MH  <  450  Gev 

14.6% 

1489 

0.7% 

13615 

9.1 

2.5 

21.1 

Z-mass  &  net  cut 
81  <  Mz  <  101  Gev 
net  cut  >  0.75 

6.8% 

696 

0.2% 

3230 

4.6 

4.9 

41.7 

//-mass  &  net  cut 
350  <  MH  <  450  Gev 
net  cut  >  0.75 

5.6% 

568 

0.1% 

1525 

2.7 

3.4 

72.0 

Table  4.2:  400  GeV  Higgs  boson  produced  in  15  TeV  p-p  collisions.  The  table 
shows  the  number  of  events  per  year  (with  Lum=105/pb)  for  the  H  — >  ZZ  signal 
and  Z+jets  background  for  the  case  of  multiple  interactions  per  beam  crossing 
(i.e.,  with  pile-up).  The  "zero-level"  lepton  trigger  is  used  as  a  reference  point  and 
is  normalized  to  100%.  The  enhancement  factor  is  defined  to  be  the  percentage  of 
signal  divided  by  the  percentage  of  background  surviving  the  given  set  of  cuts. 


•  100  <  M^full)  <  120, 

where  Mjj(full)  is  defined  as  before  with  Rjj(fu\\)  =  0.6.  As  before,  the 
invariant  mass  of  the  jet-pair  is  used  only  in  the  selection  of  events,  the 
Higgs  mass  is  reconstructed  from  the  momentum  of  the  jet-pair  with  Mjj  set 
equal  to  Mz.  As  can  be  seen  from  Table  2,  in  this  case  about  36%  of  the 
Higgs  boson  signal  passing  both  the  "zero-level"  lepton  cut  and  the  jet-pair 
selection  criterion  have  Mjj  within  this  range,  which  is  slightly  less  than  the 
51%  for  the  no  pile-up  case.  About  7%  of  the  Z+jets  background  events 
surviving  both  the  "zero-level"  lepton  cut  and  the  jet-pair  selection  criterion 
have  a  full  jet-pair  invariant  mass  in  this  range,  which  is  about  the  same  as 


101 


25%  T 
20% 


w 

£15%  4 

5 

lL 

"10%  -- 


5%  -• 


0% 


0 


Jet  Multiplicity 


400  GeV  Higgs  in  1 5  TeV  pp  collisions 
1 00  <  Mjj  <  1 20  GeV  with  pile-up 


,;l^~l^~l|—<-TI|— r-n^-n|  1    .  :^_Jg 


12 


14 


16 


18 


Number  of  Jets  with  ET  >  1 0  GeV  a  Higgs->ZZ  Signal  a  1*  Jets  Background 


Figure  4.12:  Shows  the  multiplicity  of  jets  for  400  GeV  Higgs  bosons  produced  in 
15  TeV  p-p  collisions.  The  plot  corresponds  to  the  percentage  of  events  with  N  jets 
with  ET  greater  than  10  GeV  for  the  H  — »  ZZ  signal  and  the  Z+jets  background 
for  the  case  of  multiple  interactions  per  beam  crossing  (pile-up).  The  events  have 
survived  the  "zero-level"  lepton  trigger  and  the  jet-pair  selection  criterion  with 
100  <       (full)  <  120  GeV. 


the  no  pile-up  case.  This  corresponds  to  an  overall  enhancement  factor  at 
this  stage  of  about  8  with  an  overall  efficiency  of  about  19%,  which  is  slightly 
worse  than  the  no  pile-up  case. 

At  this  stage,  Table  2  shows  that  there  are  about  1,500  Higgs  events  and 
14, 000  background  events  per  year  within  50  GeV  of  the  true  Higgs  mass  that 
pass  the  "zero-level"  lepton  trigger,  the  jet-pair  selection  criterion,  and  have  100  < 
Mjj(full)  <  120  GeV.  This  corresponds  to  an  overall  enhancement  factor  of  about 
21  with  an  overall  efficiency  of  about  15%.  With  this  enhancement,  the  Z+jets 
background  is  roughly  9  times  the  signal.  At  this  stage,  we  apply  a  neural  network 
to  improve  the  signal  to  background  ratio  beyond  what  can  be  achieved  with  these 
standard  cuts. 


102 


Transverse  Energy  Fraction 


25%  t 

c 

5 

20%  - 

o 
o 
c 

15%  - 

c 

10%  - 

ID 

> 

LU 

5%  - 

0%  - 

0. 

400  GeV  Higgs  in  1 5  TeV  pp  collisions 
1 00  <  Mjj  <  1 20  GeV  with  pile-up 


41 


-t — (- 


ET(0.6<R<1.0)/ET(R<1.0) 


l  Higgs->ZZ  Signal  □  Z+Jets  Background 


Figure  4.13:  Shows  the  fraction  of  transverse  energy  coming  from  calorimeter  cells 
within  the  "halo"  region  0.6  <  Rjj  <  1.0  surrounding  either  of  the  away-side  jets. 
The  plot  corresponds  to  the  percentage  of  events  with  the  jet-jet  transverse  energy 
fraction  within  the  0.025  bin  for  the  H  — >•  ZZ  signal  and  the  Z+jets  background 
for  the  case  of  multiple  interactions  per  beam  crossing  (pile-up).  The  events  have 
survived  the  "zero-level"  lepton  trigger  and  the  jet-pair  selection  criterion  and  have 
100  <  Mjj  (full)  <  120  GeV. 


4.8    Network  Performance 

We  use  the  same  nine  variables  to  characterize  the  events,  but  since  these 
variables  have  changed  dramatically,  the  network  must  be  retrained.  Figure  4.12 
shows  the  new  jet  multiplicities.  Figure  4.13  and  Figure  4.14  show  that  the  fraction 
of  transverse  energy  and  mass,  respectively,  originating  in  the  extended  region, 
0.6  <  Rjj  <  1.0,  has  greatly  increased  for  both  the  signal  and  background  events 
due  to  the  pile-up.  Nevertheless,  there  are  still  slight  differences  between  signal 
and  background  that  the  network  can  use  to  distinguish  between  the  two. 

The  9-16-8-1  (305)  network  is  retrained  on  sample  of  2,  741  signal  and  3,  566 
background  events  that  include  the  pile-up  interactions.  Both  signal  and  back- 


103 


c 

£ 

m 
o 


c 

0) 

> 

UJ 


Mass  Fraction 


40% 
35% 
30% 
25% 
20% 
15% 
10% 
5% 
0% 


400  GeV  Higgs  in  1 5  TeV  pp  collisions 
1 00  <  Mjj  <  1 20  GeV  with  pile-up 


_i  1  h 

0.025  0.125  0.225  0.325 
M(0.e<R<1.0)/M(R<1.0) 


-t- 


-+- 


0.425 


0.525      0.625  0.725 


Higgs->ZZ  Signal  dZ+Jets  Background 


Figure  4.14:  Shows  the  fraction  of  invariant  mass  coming  from  calorimeter  cells 
within  the  "halo"  region  0.6  <  Rjj  <  1.0  surrounding  either  of  the  away-side  jets. 
The  plot  corresponds  to  the  percentage  of  events  with  the  jet-jet  invariant  mass 
fraction  within  the  0.05  bin  for  the  H  — >•  ZZ  signal  and  the  Z+jets  background 
for  the  case  of  multiple  interactions  per  beam  crossing  (pile-up).  The  events  have 
survived  the  "zero-level"  lepton  trigger  and  the  jet-pair  selection  criterion  and  have 
100  <  M#(full)  <  120  GeV. 


ground  events  have  already  satisfied  the  "zero-level"  lepton  cuts,  the  jet-pair  se- 
lection, and  have  100  <  Mjj(full)  <  120  GeV.  To  get  this  training  sample  it  was 
necessary  to  generate  40,  000  Higgs  boson  events  with  pile-up  and  400,  000  Z+jet 
events  with  pile-up.  Running  with  pile-up  is  a  lot  slower  since  a  large  number  of 
events  enter  the  calorimeter  during  each  beam  crossing.  Because  of  this  we  are 
using  a  very  small  training  sample.  We  could  do  better  with  a  larger  sample,  but 
this  is  sufficient  for  what  we  want  to  illustrate  in  this  paper.  After  training,  we 
achieve  a  xiet  =  0.1797  with  a  network  response  for  the  training  events  shown  in 
Figure  4.15.  Figure  4.15  also  shows  the  network  response  (i.e.,  znet)  for  an  inde- 
pendent sample  of  signal  and  background  events  not  used  in  the  training.  In  spite 


104 


16%  T 

CD 

g  12% 
1  10% 


0) 
*— 

c 
> 

LU 
J? 


8% 
6% 
4% 
2% 
0% 


Network  Response 


Net  =  9-16-8-1  (305) 

400  GeV  Higgs  in  1 5  TeV  pp  collisions 
1 00  <  Mjj  <  1 20  GeV   with  pile-up 


Background 


o  Sig  (training  sample) 
n  Bak  (training  sample) 


03  Sig  (independent  sample) 
□  Bak  (independent  sample) 


Figure  4.15:  Shows  the  network  response,  znet,  for  the  sample  of  signal  and 
background  events  used  in  the  training  and  for  an  independent  sample  of  sig- 
nal and  background  events.  The  plot  corresponds  to  the  percentage  of  events  with 
znet  within  a  0.05  bin  for  the  H  ->  ZZ  signal  and  Z+jets  background  for  the 
case  of  multiple  interactions  per  beam  crossing  (pile-up).  The  events  have  sur- 
vived the  "zero-level"  lepton  trigger  and  the  jet-pair  selection  criterion  and  have 
100  <  Mjj{m)  <  120  GeV. 


of  the  small  training  sample,  the  network  generalizes  fairly  well. 

The  network  performance  for  the  training  sample  is  shown  in  Figure  4.7  to- 
gether with  the  no  pile-up  case.  Again  we  choose  a  network  cut-off,  zcut  of  0.75, 
which  in  this  case  for  the  training  sample  corresponds  to  a  relative  enhancement 
of  about  6  with  an  relative  efficiency  of  about  38%. 

We  now  analyze  an  independent  sample  of  signal  and  background  events  with 
pile-up.  Figure  4.6  shown  the  reconstructed  Higgs  mass  for  both  the  signal  and 
background  events  that  have  passed  the  lepton  cuts,  the  jet-pair  selection  with 
100  <  Mjj(full)  <  120 GeV,  and  the  network  cut-off  (with  zcut  =  0.75).  Now, 
there  are  about  600  Higgs  events  and  1500  QCD  background  events  per  year 


105 


Figure  4.16:  Shows  the  reconstructed  mass  of  a  400  GeV  Higgs  boson  produced  in 
15  TeV  p-p  collisions.  The  plot  corresponds  to  the  number  of  events  per  year  (with 
Lum=105/pb)  in  a  25  GeV  bin  for  the  H  — >•  ZZ  signal  and  the  z+jets  background 
for  the  case  of  multiple  interactions  per  beam  crossing  (pile-up).  The  events  have 
survived  the  "zero-level"  lepton  trigger  and  the  jet-pair  selection  criterion  with 
100  <  Mjj(full)  <  120  GeV  and  have  passed  the  network  cut-off  (i.e.,  have  znet  > 
0.75). 


within  50  GeV  of  the  true  Higgs  mass  of  400  GeV.  This  corresponds  to  an  overall 
enhancement  factor  of  about  72  (see  Table  4.2)  with  an  overall  efficiency  of  about 
6%.  Although  the  results  are  not  quite  as  good  as  the  no  pile-up  case,  signal  and 
background  are  again  roughly  comparable  and  the  network  has  improved  the  signal 
to  background  ratio  by  about  a  factor  of  4. 

Therefore,  using  observables  that  measure  how  transverse  energy  and  mass, 
respectively,  are  distributed  around  the  away-side  jet-jet  system,  a  neural  network 
can  help  to  distinguish  the  two  jet  system  originating  from  the  qq  decay  of  a  color 
singlet  Z  boson  from  a  random  jet-pair  coming  from  the  "ordinary"  QCD  gluon 
bremsstrahlung  of  colored  quarks  and  gluons.  We  have  used  the  neural  network 


106 


in  conjunction  with  the  standard  Higgs  boson  cuts  to  provide  additional  signal  to 
background  enhancements.  The  Higgs  mass  is  reconstructed  from  the  momentum 
of  the  jet-pair  with  Mjj  set  equal  to  Mz.  We  are  able  to  obtain  an  overall  signal  to 
background  enhancement  of  around  10  with  the  standard  Higgs  boson  cuts.  The 
neural  network  provides  an  additional  enhancement  of  4-5  beyond  what  can  be 
achieved  with  the  standard  data  cuts  resulting  in  an  overall  enhancement  of  about 
50.  We  believe  that  we  could  further  improve  the  network  performance  by  using 
larger  training  samples  and  by  increasing  the  number  of  input  variables  to  include 
additional  global  information  such  as  the  number  of  forward  jets  in  the  event. 

Our  method  works  even  with  a  large  number  of  interactions  per  beam  crossing. 
This  shows  that  some  jet  physics  can  be  done  even  in  the  large  pile-up  environment 
of  the  LHC.  Although  this  paper  is  not  a  detailed  simulation,  experiments  at 
the  LHC  should  be  able  to  do  as  well  or  better  than  our  analysis.  Furthermore, 
our  procedure  can  be  applied  to  W  bosons  and  should  help  enhance  the  Higgs— > 
WW  — »■  iujj  signal  at  hadron  colliders  as  well. 


CHAPTER  5 
ENHANCING  THE  TOP  QUARK  SIGNAL 


The  Standard  Model  has  enjoyed  outstanding  success,  yet  the  top  quark,  which 
is  required  as  the  weak-isospin  partner  of  the  bottom  quark,  has  been  difficult  to 
observe.  More  evidence  and  better  techniques  are  required  to  substantiate  its  ex- 
istence. We  propose  to  use  computational  techniques  such  as  Neural  networks  or 
Fisher  discriminants  to  help  analyze  data  for  the  Top  quark  [37,  2,  4].  We  in- 
vestigate the  event  signature  of  the  Ivbbqq  decay  mode  of  top-pair  production  in 
proton-antiproton  collisions  at  1.8  TeV.  Neural  networks  and  Fisher  discriminates 
are  used  in  conjunction  with  modified  Fox- Wolfram  "shape"  variables  to  help  dis- 
tinguish the  top-pair  signal  from  the  W+jets  and  bb+jets  background.  Instead 
of  requiring  at  least  four  jets  in  the  event,  we  find  that  it  is  faster  and  better  to 
simply  cut  on  the  number  of  calorimeter  cells  with  transverse  energy  greater  than 
some  minimum.  By  combining  these  cell  cuts  with  the  event  shape  information 
we  are  able  to  obtain  a  signal  to  background  ratio  of  around  9  while  keeping  30% 
of  the  signal.  This  corresponds  to  a  signal  to  background  enhancement  of  around 
370. 

5.1    Top  Decay  Processes 

One  expects  that,  at  Tevatron  energies,  most  top  quarks  are  produced  in  pairs. 
For  Mtop  >  85  GeV/c2,  each  top  quark  decays  to  a  real  W  boson  and  a  b  quark. 


107 


108 


The  observed  event  topology  is  then  determined  by  the  decay  mode  of  the  two  W 
bosons.  About  5%  of  the  time  both  W  bosons  decay  to  ev  or  p,v  (the  "dilepton 
mode"),  giving  two  high-PT  leptons  with  opposite  charge,  two  b  jets,  and  large 
missing  transverse  energy  from  the  undetected  neutrinos.  In  another  30%  of  the 
cases  one  W  boson  decays  to  ev  or  /xi/,  and  the  other  to  a  qq'  pair  (the  "lepton+jets 
mode").  This  final  state  includes  a  high-PT  charged  lepton,  and  jets  from  the  W 
and  the  two  b  quarks.  The  remaining  65%  of  the  final  states  involve  the  hadronic 
decays  of  both  W  bosons,  or  the  decay  of  one  or  both  of  the  W  bosons  into  r 
leptons.  These  channels  have  larger  backgrounds  and  are  not  considered  here. 
This  analysis  is  based  on  a  sample  of  pp  collisions  at  y/s  =  1.8  TeV  with  an 
integrated  luminosity  of  19.3±0.7  pb_1, 

The  top  quarks  decays  into  a  fe-quark  and  a  W  boson,  t  — >•  bW.  The  W  boson 
decays  into  a  lepton  (e  or  p)  and  a  neutrino  about  22%  (2/9)  of  the  time  and  into 
a  quark-antiquark  pair  about  67%  (6/9)  of  the  time.  This  implies  that  when  top- 
pairs  are  produced  in  hadron-hadron  collisions,  pp  — >•  ti+X,  both  of  the  W  bosons 
decay  into  a  lepton  and  neutrino  only  about  5%  of  the  time  resulting  in  the  final 
state  consisting  of  two  leptons,  two  neutrinos,  and  two  b-quarks  (ilvvbb).  This 
distinctive  final  state  constitutes  the  "discovery"  mode  of  the  top  quark  at  hadron 
colliders.  On  the  other  hand,  it  is  considerable  more  likely  for  one  of  the  W  bosons 
to  decays  into  a  quark-antiquark  pair  resulting  in  a  final  state  consisting  of  a  lepton, 
a  neutrino,  and  a  bb  and  a  qq  pair.  The  Ivbbqq  mode  shown  in  Figure  5.1  occurs 
about  35%  of  the  time  or  about  7  times  more  often  than  the  purely  leptonic  mode. 
The  backgrounds  are  larger  for  this  decay  mode,  but  so  is  the  signal.  When  each  of 
the  four  outgoing  quarks  produce  a  distinct  jet,  then  the  resulting  event  contains 
a  lepton,  a  neutrino,  and  four  jets  (ivjjjj).  This  decay  mode  is  used  to  analyze 
the  properties  of  the  top  quark  in  more  detail  and  to  determine,  for  example,  the 
top  mass.  The  purely  hadron  six  jet  decay  mode  occurs  about  60%  of  the  time, 


109 


but  it  is  completely  buried  underneath  "ordinary"  QCD  multijet  production. 

For  our  study,  we  concentrate  on  the  Ivbbqq  decay  mode  of  top-pair  production 
in  proton-antiproton  collisions  at  1.8  TeV  and  investigate  ways  to  optimize  this 
signal  over  the  backgrounds.  The  event  topology  of  the  signal  is  shown  in  Figure 
5.1  and  consists  of  a  lepton,  a  neutrino,  and  four  outgoing  quarks  which  manifest 
themselves  as  "jets".  In  the  center-of-mass  of  a  175 GeV  top  quark,  the  W  boson 
and  6-quark  decay  products  each  have  a  momentum  of  about  70  GeV.  Furthermore, 
in  the  CM  frame  of  the  W  boson,  the  quark  and  antiquark  decay  products  each 
have  a  momentum  of  about  40  GeV.  The  top-pair  are  produced  near  threshold 
resulting  in  a  typical  event  that  is  rather  spherical  in  shape  with  all  six  decay 
products,  iubbqq,  having  large  transverse  energy.  The  background  comes  from 
the  "ordinary"  QCD  production  of  large  transverse  momentum  W  bosons  plus 
multiple  jets  as  shown  in  Figure  5.3  and  from  the  production  of  6-quark  pairs  plus 
associated  jets  as  illustrated  in  Figure  5.3.  We  begin  our  analysis  of  the  signal 
and  backgrounds  in  Section  5.2  with  a  discussion  of  the  event  generation.  Lepton 
plus  missing  transfer  energy  cuts  is  discussed  in  Section  5.3.  Calorimeter  cell  cut 
is  explained  in  Section  5.4.  In  Sections  5.5  and  5.6  we  reconstruct  the  invariant 
mass  of  the  top-pair  and  compare  it  with  the  true  parton-parton  CM  energy.  In 
Section  5.7,  we  introduce  modified  Fox- Wolfram  "shape"  variables  and  apply  them 
to  the  outgoing  jets.  In  Sections  5.8  and  5.9  the  use  of  neural  networks  and  Fisher 
discriminates  is  explored. 

5.2    Event  Generation 

ISAJET  version  7.06  is  used  to  generate  top  quarks  with  a  mass  of  175  GeV 
in  1.8  TeV  proton-antiproton  collisions.  At  this  energy,  175  GeV  top-pairs  are 
produced  via  quark-antiquark  annihilation,  qq  — >  ti,  about  88%  of  the  time  and  by 


110 


Figure  5.1:  Illustration  of  top-pair  production  in  p-p  collisions  in  which  one  of  the 
W  bosons  decay  leptonically  and  the  other  decays  hadronically  resulting  in  a  final 
state  consisting  of  a  lepton,  a  neutrino,  bb,  and  a  qq  pair. 


Figure  5.2:  Shows  the  event  topology  for  the  top-pair  signal.  If  each  outgoing 
quark  produces  a  distinct  jet  then  the  final  state  contains  a  lepton,  a  neutrino 
(missing  Er),  and  four  jets. 


Ill 


Figure  5.3:  Illustrates  a  W  +  jets  background  process  to  the  top  pair  production 
in  the  pp  collisions. 


gluon-gluon  fusion,  gg  — >  ti,  the  remaining  12%.  We  refer  to  this  as  the  "signal". 
We  have  normalized  the  top  cross  section  to  be  7.5  pb  corresponding  to  750  events 
with  an  integrated  luminosity  of  100/pb.  The  "background"  consists  of  single 
W  boson  events  generated  with  the  hard-scattering  transverse  momentum,  kT, 
greater  than  25  GeV.  Single  W  bosons  are  produced  at  large  transverse  momentum 
via  the  "ordinary"  QCD  subprocesses  qg  — >  Wq,  qg  — >  Wq,  and  qq  ->  Wg. 
These  subprocesses,  of  course,  generate  additional  gluons  via  bremsstrahlung  off 
both  incident  and  outgoing  color  non-singlet  partons,  resulting  in  multiparton  final 
states  which  subsequently  fragment  into  hadrons.  This  is  referred  to  as  the  W+jets 
background.  Another  background  is  b-quark  pairs  produced  via  the  subprocesses 
qq  — >  bb  and  qq  — >  bb  and  the  accompanying  radiation.  This  is  referred  to  as  the 
66+jets  background. 

We  do  not  attempt  to  do  a  detailed  simulation  of  the  CDF  or  DO  detector. 
Events  are  analyzed  by  dividing  the  solid  angle  into  "calorimeter"  cells  having  size 
Ar)A(/>  =  0.1  x  7.5°,  where  rj  and  4>  are  the  pseudorapidity  and  azimuthal  angle, 


112 


P 


p 


Figure  5.4:  Illustrates  a  bb  +  jets  background  process  to  the  top  pair  production 
in  the  pp  collisions. 

respectively.  Our  simple  calorimeter  covers  the  range  \r}\  <  4  and  has  3840  cells. 
A  single  cell  has  an  energy  (the  sum  of  the  energies  of  all  the  particles  that  hit  the 
cell  excluding  neutrinos)  and  a  direction  given  by  the  coordinates  of  the  center  of 
the  cell.  From  this  the  transverse  energy  of  each  cell  is  computed  from  the  cell 
energy  and  direction.  We  have  taken  the  energy  resolution  to  be  perfect,  which 
means  that  the  only  resolution  effects  are  caused  by  the  lack  of  spatial  resolution 
due  to  the  cell  size.  Large  transverse  momentum  leptons  are  analyzed  separately 
and  are  not  included  when  computing  the  energy  of  a  cell. 


The  "zero-level"  trigger  is  designed  to  select  large  transverse  momentum  W 
bosons  that  have  decayed  into  a  charged  lepton  and  a  neutrino.  This  first  cut  is 
made  by  demanding  that  the  event  contain  at  least  one  isolated  high  transverse 
momentum  charged  lepton       =  e±  or       in  the  central  region  satisfying: 


5.3    Lepton  Plus  Missing  Transverse  Energy  Trigger 


113 


Selection 
Cut 

Signal 
Top  (175  gev) 

Background 
W  +  jets 

sig/ 
back 

Enhancement 
Factor 

% 

Remain  Events 

% 

Remain  Events 

Relative  Overall 

Pf(t)  >  15  Gev 
ET(miss)  >  20  Gev 
PT(tv)  >  25  Gev 

100%  165 

100%  7044 

0.0234 

1.0  1.0 

JV(cell)  >  7 
Ex  >  5  Gev 
ET(sum)  >  100  Gev 

69.0%  113 

0.7%  49 

2.3 

99.7  99.7 

Fisher  cut 
F  >  0.75  Gev 

30%  49 

0.1%  6 

8.7 

3.7  373.0 

Table  5.1:  175  gev  Top  quark  pairs  produced  in  1.8  Tev  p-p  collisions.  The  table 
shows  the  number  of  events  (with  Lum  =  100/pb)  for  the  top-pair  signal  and  the 
W+jets  background.  The  bb  +jets  background  is  shown  in  parenthesis. 

•  Pt(^)  >  15GeV,     17/(^)1  <  2.5. 

"Isolated"  leptons  are  defined  by  demanding  that  the  total  transverse  energy  within 
a  distance  Re  of  the  lepton  in  r\-<$>  space  be  less  than      (max).  For  this  analysis, 

•  Rt  =  0.2  ,     Jf?f  (max)  =  5  GeV. 

In  addition,  the  event  must  have  large  missing  transverse  energy,  Et,  and  an  overall 
lepton-neutrino  transverse  momentum,  PT(£i>),  given  by: 

•  >  20  GeV  ,     PT(£u)  >  25  GeV, 

where  the  missing  transverse  momentum  2- vector,  is  determined  from  the  trans- 
verse energy  grid  (i.e.,  the  calorimeter)  and 

Wr  =  iC+J?,  (5-1) 

pfrtv)  =  (p£+yj2  +  (p;+jy2,  (5.2) 

where  the  x-axis  and  y-axis  are  perpendicular  to  the  colliding  beams  and  the  z-axis 
is  parallel. 


114 


This  selection  of  PT{(±)  >  15GeV,  $T  >  20GeV,  and  PT{tv)  >  25GeV  is 
referred  to  as  the  lepton  plus  missing  ET  trigger.  Table  5.1  shows  that  about 
165  top-pair  events  survive  this  selection  criterion  (22%  of  the  total  top  signal), 
for  illustration,  we  take  the  integrated  luminosity  to  be  100/pb.  Table  5.1  also 
shows  that  about  7,  000  W+jets  and  500  bb+jets  background  events  also  survive 
this  cut.  The  lepton  isolation  cuts  do  a  good  job  removing  most  of  the  bft+jets 
background,  so  we  will  concentrate  primarily  on  the  W+jets  background.  At  this 
stage,  the  background  is  about  43  times  the  signal.  In  order  to  quantify  how 
various  additional  cuts  enhance  the  signal  above  the  background,  we  define  the 
enhancement,  Fenh,  and  efficiency,  Feff,  of  a  given  set  of  cuts  as  follows 

%  of  signal  surviving  cut 
enh     %  of  background  survivng  cut 

Fejj  =  %  of  signal  surviving  cut. 

We  define  the  "zero-level"  trigger  to  be  the  reference  point  and  the  fraction  of 
events  escaping  this  cut  is  set  to  100%  in  Table  5.1.  Similarly,  all  "enhancement" 
factors  are  set  to  one  at  this  level  as  we  measure  the  effectiveness  of  all  other  addi- 
tional cuts  from  this  point.  The  overall  enhancement  and  efficiency  is  determined 
by  examining  the  number  of  events  before  and  after  the  particular  cut. 

5.4    Calorimeter  Cell  Cuts 

At  this  stage  in  the  analysis  one  normally  demands  that  the  event  contain  at 
least  four  jets.  Cutting  on  the  number  of  jets  is  a  way  to  preferentially  select  the 
top-pair  signal  over  the  background.  However,  we  have  found  that  it  is  faster  and 
better  to  simply  cut  on  the  number  of  calorimeter  cells,  Nceu,  with  transverse  en- 
ergy greater  than  some  minimum,  E^ll(min).  Figure  5.5  shows  the  cell  multiplicity 


115 


35%  j 

30%  -■ 

W 

25%  -■ 

c 

> 

20%  -• 

UJ 

o 

15%  -• 

10%  -■ 

5%  -■ 

0%  -• 

Multiplicity  of  Cells 


u 


cut 


0   12   3   4   5   6  7 
Number  of  Cells 


ET(cell)  >  5  GeV 


175  GeV  Top  Quark  in  1.8 
TeV  PbarP  Collisions 
After  "zero-level"  Lepton 
plus  missing  ET  Trigger 


PPp 


P  n  r-  i 


8   9  10  11  12  13  14  15  16  17  18  19  20 


□  Top  Signal  DVV+ Jets  Background 


Figure  5.5:  Shows  the  multiplicity  of  calorimeter  cells  containing  at  least  5  GeV  of 
transverse  energy  for  the  top-pair  signal  and  the  W+jets  background.  In  all  cases 
the  events  have  survived  the  "zero-level"  lepton  plus  missing  ET  trigger.  The  plot 
shows  the  percentage  of  events  with  N  cells  with  ET(cell)  >  5.  The  position  of 
our  cell  cut  is  marked  by  the  dotted  line. 


with  ^^"(min)  =  5 GeV  for  the  top-pair  signal  and  the  W+jets  background.  On 
the  average,  the  top-pair  signal  populates  a  larger  number  of  cells  than  does  the 
background.  Obviously  this  is  because  the  top-pair  signal  produces  more  jets. 
However,  one  does  not  have  to  define  a  "jet"  to  see  this  topology.  The  top-pair 
signal  produces  transverse  energy  flying  out  in  all  directions  and  this  can  be  seen 
directly  from  the  calorimeter  cell  multiplicity. 

The  top-pair  signal  also  produces  more  global  transverse  energy  than  the  back- 
ground. This  is  shown  in  Figure  5.6,  where  we  define  ET(sum)  to  be  the  sum  of 
the  transverse  energy  of  all  the  calorimeter  cells  with  ET  >  ^"(min).  As  shown 
in  Figure  5.5  and  Figure  5.6,  we  make  the  following  calorimeter  cell  cuts: 


116 


Global  Transverse  Energy 


60%  j 

50%  -■ 

c 

40%  -■ 

03 

> 

tu 

30%  -• 

O 

,\° 

20%  -• 

10%  - 

0%  -■ 

cut 


mi 


C  ET(cell)  >  5  GeV} 

175  GeV  Top  Quark  in1.8TeV 
PbarP  Collisions  After  "zero-level" 
Lepton  plus  missing  ET  Trigger 


-P-P. 


P   H   r"  r* 


12.5      62.5      112.5     162.5     212.5     262.5     312.5  362.5 
ET(sum)  GeV 


m  Top  Signal  DW+ Jets  Background 


Figure  5.6:  Shows  the  total  transverse  energy  of  all  the  calorimeter  cells  with 
ET(cell)  >  5  GeV  for  the  top-pair  signal  and  the  W+jets  background. In  all  cases 
the  events  have  survived  the  "zero-level"  lepton  plus  missing  Et  trigger.  The  plot 
shows  the  percentage  of  events  with  ^(sum)  within  a  25  GeV  bin.  The  position 
of  our  cell  cut  is  marked  by  the  dotted  line. 


•  Arce;(  >  8  with  £f"(min)  =  5  GeV  ,  and  £T(sum)  >  100  GeV. 

At  this  stage  one  could  cut  harder  on  ^(sum)  and  remove  more  background. 
However,  we  want  to  avoid  as  much  as  possible  cuts  that  cause  the  background 
invariant  mass  to  peak  at  the  same  place  as  the  top-pair  signal.  For  this  reason, 
we  will  use  event  "shape"  variables  to  further  improve  the  signal  to  background 
ratio. 

Table  5.1  shows  that  of  the  165  top-pair  events  passing  the  "zero-level"  lepton 
plus  missing  ET  cut  roughly  69%  also  pass  the  calorimeter  cell  cuts.  On  the  other 
hand,  less  than  1%  of  the  W+jets  background  events  survive  the  cell  cuts.  The 
calorimeter  cell  cuts  produce  an  enhancement  of  0.69/0.007  or  about  100  over  the 


117 


45%  j 

40%  -• 

35% 

|  30% 

5  25%-- 
UJ 

Z.  20%  -■ 

5  15%  -■ 

is 

10%  -■ 
5% 
0% 


Multiplicity  of  Jets 


 1  <=J~ 

0  1  2 
Number  of  Jets 


(ETQet)  >  15  GeV) 

175  GeVTop  Quark  in  1.8 
TeV  PbarP  Collisions  After 
"zero-level"  Lepton,  missing 

ET  Trigger  and,  Cell  Cuts 


CZL 


8 


m Top  Signal  CIW+  Jets  Background 


Figure  5.7:  Shows  the  multiplicity  of  jets  with  transverse  energy  greater  than  15 
GeV  for  the  top-pair  signal  and  W+jets  background.  In  all  cases  the  events  have 
survived  the  "zero-level"  lepton  plus  missing  Ey  trigger  and  the  calorimeter  cell 
cuts.  The  plot  shown  the  percentage  of  events  with  N  jets  with  Er(jet)  >  15  GeV. 


W+jets  background  with  69%  efficiency,  resulting  in  a  signal  to  background  ratio 
of  about  2.  The  NcM  >  8  with  Ej?ll(mm)  =  5  GeV  cut  produces  more  than  a 
factor  of  two  better  enhancement  than  the  traditional  "jet  cuts"  (i.e.,  Njet  >  4). 
Adding  the£'r(sum)  >  100  GeV  cut  gives  an  additional  relative  enhancement  of 
more  than  a  factor  of  three. 

5.5    Reconstructing  the  Top-Pair  Invariant  Mass 

Ideally  one  would  like  to  reconstruct  the  invariant  mass  of  the  top-pair  from 
its  decay  products:  lepton,  neutrino,  and  four  jets.  However,  the  neutrino  is  not 
detected  and  its  presence  must  be  inferred  by  examining  the  missing  transverse 
momentum,  ]/>T.  If  we  set  the  transverse  momentum  components  of  the  neutrino 


118 


equal  to  the  missing  transverse  momentum, 


Pt  =Pti 


and  assume  that  the  charged  lepton  and  the  neutrino  are  the  result  of  a  W  decay 
(and  neglect  the  W  width)  then  the  longitudinal  momentum  of  the  neutrino  is 
given  by  one  of  the  two  solutions: 


Pi 


~Ap[  ±  i#V^2-4(pf)Wj  /  2(Pt)2, 
where  Ei,  plL,  and  pT,  are  the  energy,  longitudinal  momentum,  and  transverse 
momentum,  respectively,  of  the  charged  lepton,  and  pT  is  the  transverse  momentum 
of  the  neutrino.  The  quantity  A  is  given  by 

A  =  Mw  +  2p?  -Pt  =  M%  +  plTpT  cos  <f>, 

where  (f>  is  the  azimuthal  angle  between  the  transverse  momentum  vector  of  the 
charged  lepton  and  the  neutrino.  We  include  both  solutions  in  our  determination 
of  the  top-pair  invariant  mass. 

We  have  not  used  jets  in  our  event  trigger,  however,  we  do  use  jets  to  recon- 
struct the  top-pair  invariant  mass.  In  addition,  we  use  the  jet  topology  to  help 
further  distinguish  the  signal  from  the  backgrounds.  Jets  are  defined  using  a  sim- 
ple algorithm.  One  first  considers  the  "hot"  cells  (those  with  transverse  energy 
greater  than  5  GeV).  Cells  are  combined  to  form  a  jet  if  they  lie  within  a  specified 
"radius"  =  Arf  +  A(j)2  in  rj-(f)  space  from  each  other.  Jets  have  an  energy  given 
by  the  sum  of  the  energy  of  each  cell  in  the  cluster  and  a  momentum  pj  given 
by  the  vector  sum  of  the  momentums  of  each  cell.  The  invariant  mass  of  a  jet  is 
simply  Mj  =  E?  —  pj  ■  pj.  In  this  analysis,  we  take  the  jet  radius  to  be  Rj  =  0.4 
and  require  jets  to  have  at  least  15  GeV  of  transverse  energy.  Namely, 


119 


Top-Pair  Invariant  Mass 


2M(top) 


175GeVTop  in  1 .8  TeV  PbarP 
Collisions  After  "zero-level"  Lepton 
missing  ET  Trigger,  and  Cell  Cuts 


35  -■ 


40  T 


Reconstructed 
Invariant  Mass 


S  25- 


S  20  -• 


True  parton-parton  CM  Energy 


\ 


LD 
CM 


ID  LD  LD 

CM  CM  CM 

r-  CM  CO 


ID  LD  LD 


Mass  or  Energy  (GeV) 


Figure  5.8:  Shows  the  reconstructed  top-pair  invariant  mass,  Ma,  for  175  GeV 
top  quarks  produced  in  1.8  TeV  pp  collisions  (solid  curve).  The  plot  contains  only 
the  top-pair  signal  and  corresponds  to  the  number  of  events  per  year  (with  Lum= 
100/pb)  in  a  50  GeV.  The  events  have  survived  the  "zero-level"  lepton  plus  missing 
Et  trigger  and  the  calorimeter  cell  cuts.  Also  shown  in  the  true  parton-parton  CM 
energy  of  the  event  (not  directly  observable  experimentally). 


•  Rj  =  0.4  and  ET(jet)  >  15  GeV. 

The  top-pair  invariant  mass,  Mtj,  is  constructed  from  the  energy  and  momen- 
tum of  the  charged  lepton,  the  energy  and  momentum  of  the  reconstructed  neu- 
trino, and  the  overall  momentum  vector  of  the  associated  jets  as  follows: 


Ml  =  (Et  +  EU  +  Ejets)2  -  (pe  +p„+  p]etsf  , 


where 


jets 

Pjets  —  ]^  Pi  j 


120 


and 

jets 

Ejets  =  /  ]  Ej. 
i 

The  overall  jet  energy,  Ejets,  and  momentum,  Pjets,  is  constructed  by  summing 
over  all  jets  with  transverse  energy  greater  than  15  GeV.  We  do  not  require  the 
event  to  have  a  minimum  of  four  jets.  The  calorimeter  cell  cuts  have  replaced  the 
need  to  make  a  jet  multiplicity  cut.  This  can  be  seen  in  Figure  5.7  which  shows  the 
multiplicity  of  jets  with  ET(jet)  >  15  GeV  for  the  top-pair  signal  and  the  W+jets 
background  after  the  lepton  plus  missing  Et  trigger  and  the  calorimeter  cell  cuts. 
The  cell  cuts  have  forced  the  signal  and  background  jet  multiplicities  to  look  similar 
and  one  does  not  gain  much  by  making  an  additional  jet  multiplicity  cut.  (At  this 
stage,  requiring  Njet  >  4  would  result  in  an  additional  relative  enhancement  of 
about  2  with  an  efficiency  of  81%.) 

5.6    Comparing  With  the  Parton-Parton  CM  Energy 

The  top-pair  invariant  mass,  Mtj,  corresponds  to  the  center-of-mass  energy, 
Ecm,  of  the  underlying  parton-parton  two-to-two  subprocess  which  has  a  threshold 
at  twice  the  mass  of  the  top  quark,  Ecm  >  2Mtop.  This  is  seen  clearly  in  Figure  5.8 
which  compares  the  true  qq  — >•  ti  and  gg  ->  it  CM  energy,  Ecm  (  not  experimentally 
observed),  with  the  reconstructed  top-pair  invariant  mass,  Mti-  If  the  neutrino 
momentum  could  be  precisely  determined  from  the  missing  Et  and  if  we  knew 
exactly  which  particles  to  include  in  the  jets  then  the  two  curves  in  Figure  5.10 
would  agree.  Although  one  cannot  precisely  reconstruct  the  parton-parton  CM 
energy,  there  still  remains  a  nice  peak  in  the  reconstructed  top-pair  invariant  mass 
at  twice  the  top  mass.  We  can  use  the  observation  of  this  peak  as  a  measure  of 
how  well  one  can  determine  the  top  quark  mass  and  we  would  like  to  remove  as 


121 


Reconstructed  Top-Pair  Invariant  Mass 


2M(top) 


35  T 


30  -H  Signal  +  Background 

CD 

I  25 
G 

IT) 

.5  15 
c  10 

ID 

>  r- 

yj  b 
0 


ID 
CM 


H  h 

CM 


175  GeVTop  Quark  in  1.8 
TeV  PbarP  Collisions  After 
"zero-level"  Lepton,  missing 
ET  Trigger  and  Cell  Cuts 


1    l  I 


IX) 
CM 
CM 


IX) 
CM 
CO 


CM 


IX) 
CM 
ID 


IX) 
CM 
CD 


in 

CM 


IX) 
CM 
CO 


m 

CM 


Invariant  Mass  (GeV) 


o  W+Jets  Background  DTop  Signal 


Figure  5.9:  Shows  the  reconstructed  top-pair  invariant  mass,  Mt(-,  for  175  GeV  top 
quarks  produced  in  1.8  TeV  pp  collisions  together  with  the  W  +  jets  background. 
The  plot  shown  the  sum  of  the  signal  plus  background  and  corresponds  to  the 
number  of  events  per  year  (with  Lum=  100/pb)  in  a  50  GeV.  The  events  have 
survived  the  "zero-level"  lepton  plus  missing  ET  trigger  and  the  calorimeter  cell 
cuts. 


much  background  as  possible  from  the  peak. 

Figure  5.8  includes  only  the  top-pair  signal  with  no  background.  Figure  5.9 
shows  the  reconstructed  parton-parton  CM  energy  for  the  top-pair  signal  and  the 
VF+jets  background  after  the  "zero  level"  lepton  plus  missing  ET  trigger  and  the 
calorimeter  cell  cuts.  The  plot  shown  the  sum  of  the  signal  and  the  background. 
At  this  stage  the  signal  is  about  twice  the  background.  However,  the  signal  to 
background  ratio  can  be  improved  by  examining  in  more  detail  the  "shape"  of  the 
events. 


122 


Signal 

Background 

Top  (175  gev) 

W  +  jets 

Fisher 

Moment 

mean 

stdev 

mean 

stdev 

Coefficient 

ffi 

0.24 

0.18 

0.36 

0.25 

-0.500 

H2 

0.28 

0.16 

0.44 

0.22 

-1.282 

H3 

0.28 

0.14 

0.40 

0.19 

-1.088 

H4 

0.28 

0.13 

0.41 

0.18 

-0.978 

H5 

0.29 

0.12 

0.40 

0.17 

-0.544 

He 

0.29 

0.12 

0.40 

0.17 

-0.069 

Table  5.2:  Shows  the  mean  value  and  standard  deviation  from  the  mean  of  the 
six  of  the  Modified  Fox- Wolfram  moments  applied  to  the  jets  in  the  event  with 
transverse  energy  greater  than  15  Gev.  Results  are  shown  for  the  top-pair  signal 
and  the  W+jets  background.  Also  shown  are  the  resulting  Fisher  coefficients. 


5.7    Events  Variables:  Fox- Wolfram  Moments 

In  1978  Geoffrey  Fox  and  Stephen  Wolfram  constructed  a  complete  set  of  rota- 
tionally  invariant  observables,  Hi  which  could  be  used  to  characterize  the  "shapes" 
of  the  final  states  in  electron-positron  annihilations  [38,  39,  40].  They  are  con- 
structed from  the  momentum  vectors,  p,  of  all  the  final  state  particles  as  follows, 

where  the  inner  sum  is  over  the  particles  produced  and  Y™  are  the  spherical 
harmonics.  Here  one  must  choose  a  particular  set  of  axes  to  evaluate  the  angles, 
&i  =  {^ii<t>i)i  of  the  final  state  particles,  but  the  values  of  the  Hi  are  independent 
of  this  choice.  These  moments  lie  in  the  range  0  <  Hi  <  1  and  if  energy  conserved 
in  the  final  state  then  H0  —  1  (neglecting  the  masses).  If  momentum  is  conserved 
in  the  final  state  then  H\  =  0. 

The  Fox- Wolfram  observables  (or  moments)  constitute  a  complete  set  of  shape 
parameters.  For  example,  the  collinear  "two-jet"  final  state  results  in  Hi  ^  1  for 


123 


Modified  H1  Applied  to  Jets 


c 

> 

LU 


in 


O 


20%  j 
18%  -• 
16%  -- 
14%  -- 
12%  -■ 
10%  -| 
8%  -• 
6%  -• 


175  GeVTop  Quark  in  1.8  TeV  PbarP 
Collisions  After  "zero-level"  Lepton, 
missing  ET  Trigger  and,  Cell  Cuts 


AX  -■  ~ 


IfHUUUT-n  n  n  n 


2%  -f 
0%  I 


0.025  0.125  0.225   0.325   0.425  0.525  0.625  0.725   0.825  0.925 


□  Top  Signal  dW+Jets  Background 


Figure  5.10:  Shows  the  modified  Fox- Wolfram  moment,  Hi,  calculated  using  the 
jets  in  the  event  with  transverse  energy  greater  than  15  GeV  for  top-pair  signal 
and  for  the  W  +  jets  background.  The  plot  shows  the  percentage  of  events  in  a 
0.05  bin.  The  events  have  survived  the  "zero-level"  lepton  plus  missing  ET  trigger 
and  the  calorimeter  cell  cuts.  (  If  the  vector  sum  of  the  momentum  of  all  the  jets 
in  the  events  is  zero  then  Hi  —  0.) 

even  I  and  Hi  «  0  for  odd  I.  Events  that  are  completely  spherically  symmetric 
give  Hi^O  for  all  I. 

In  hadron-hadron  collisions  spherical  symmetry  is  lost  and  we  are  interested 
more  in  the  shape  of  events  in  the  transverse  plane.  For  example,  the  Fox- Wolfram 
moments  when  applied  directly  to  hadron-hadron  collisions  would  interpret  a  min- 
imum bias  event  as  a  "two-jet"  event,  whereas,  we  would  like  to  have  a  minimum 
bias  event  treated  more  like  a  spherically  symmetric"  e+e~  final  state  (i.e.,  no 
structure).  To  accomplish  this  we  define  the  following  modified  Fox- Wolfram  mo- 
ments for  hadron  hadron  collisions, 


124 


20% 
18% 
16% 
14% 
12% 


V) 

*•* 
e 

0) 

uj  10% 


o 


8% 
6% 
4% 
2% 
0% 


Modified  H2  Applied  to  Jets 


1 


ETQet)  >  15  GeV) 


175  GeVTop  Quark  in  1.8 
TeV  PbarP  Collisions  After 
"zero-level"  Lepton,  missing 

ET Trigger  and,  Cell  Cuts 


JUlnR 


0.025  0.125  0.225  0.325  0.425  0.525  0.625  0.725  0.825  0.925 


HZ 


□  Top  Signal  □  W+Jets  Background 


Figure  5.11:  Shows  the  modified  Fox-Wolfram  moment,  H2,  calculated  using  all 
the  jets  in  the  event  with  transverse  energy  greater  than  15  GeV  for  top-pair  signal 
and  for  the  W  +  jets  background.  The  plot  shows  the  percentage  of  events  in  a 
0.05  bin.  The  events  have  survived  the  "zero-level"  lepton  plus  missing  ET  trigger 
and  the  calorimeter  cell  cuts. 


where  the  inner  sum  is  now  over  all  the  jets  in  the  event  with  transverse  energy, 
ElT,  greater  than  15  GeV  and  =  (6i,(j)i)  the  angular  location  of  the  jet.  Here, 
Erisum)  is  the  sum  of  the  transverse  energy  of  all  the  jets  that  are  included  in 
the  sum.  These  modified  moments  also  lie  in  the  range  0  <  Hi  <  1  and  by 
definition  Hq  =  1.  Furthermore,  if  the  transverse  momentum  of  the  jets  in  the 
event  is  conserved  then  Hi  —  0.  In  this  case,  however,  events  that  are  completely 
cylindrically  symmetric  about  the  beam  axis  give  Hi  m  0  for  all  I. 

Table  5.1  shows  the  mean  values  and  standard  deviations  for  six  of  the  modified 


125 


Modified  H4  Applied  to  Jets 


20%  -i 

18%  - 

16%  - 

14%  - 

c 

03 

12%  - 

> 

LU 

10%  - 

O 

8%  - 

.\° 

6%  - 

4%  - 

2%  - 

0%  - 

(ETQet) 


>  15  GeV 


175  GeV  Top  Quark  in  1.8  TeV 
PbarP  Collisions  After  "zero- 
level"  Lepton  plus  missing  ET 
Trigger  and,  Cell  Cuts 


0.025  0.125  0.225  0.325  0.425  0.525  0.625  0.725  0.825  0.925 
H4 


®  Top  Signal  DVV+ Jets  Background 


Figure  5.12:  Shows  the  modified  Fox- Wolfram  moment,  H4,  calculated  using  all 
the  jets  in  the  event  with  transverse  energy  greater  than  15  GeV  for  top-pair  signal 
and  for  the  W  +  jets  background.  The  plot  shows  the  percentage  of  events  in  a 
0.05  bin.  The  events  have  survived  the  "zero-level"  lepton  plus  missing  ET  trigger 
and  the  calorimeter  cell  cuts. 


Fox- Wolfram  moments  calculated  using  all  jets  with  Er{jet)  >  15  GeV  for  events 
that  have  survived  the  "zero-level"  lepton  and  missing  ET  trigger  and  the  calorime- 
ter cell  cuts.  There  are  clearly  still  some  differences  between  the  jet  topologies  of 
the  top-pair  signal  and  the  W+jets  background.  The  mean  values  of  the  six  mo- 
ments Hi,...,H6  are  smaller  for  the  signal  than  the  background  indicating  that 
the  jets  originating  from  the  top-pair  signal  form  a  more  cylindrically  symmetric 
pattern  when  they  emerge  from  the  event  than  do  the  background  jets.  The  top- 
pair  jets  are  more  spread  out  in  r)-§  space.  This  can  be  seen  in  Figures  5.10,  5.11 
and  5.12  which  show  the  Hi,  H2,  and  H4  distributions,  respectively,  for  the  signal 
and  background.  At  this  stage  one  could  simply  make  a  linear  cut  on,  for  example, 


126 


H2.  Requiring  H2  <  0.3  gives  an  additional  signal  to  background  enhancement 
of  about  2  with  a  relative  efficiency  of  around  60%.  One  can  do  a  little  better, 
however,  by  using  the  information  of  all  six  of  the  His  simultaneously.  This  can 
be  done  by  constructing  a  neural  network  or  by  using  Fisher  discriminates. 

5.8    Neural  Network  Analysis 

Neural  networks  can  be  used  to  separate  signal  from  background.  As  before, 
we  construct  neural  networks  consisting  of  a  set  of  Nin  inputs,  {x},  which  can  have 
any  value  and  one  output,  znet,  which  is  restricted  to  the  range,  0  <  znet  <  1.  The 
net  output  is  a  function  of  the  input  set  {x}  and  the  network  "memory"  (weight, 
w)  parameters  as  follows: 

Znet  =  Fnet{{x),  {w}). 

The  goal  is  to  construct  a  network  that  can  distinguish  between  two  patterns  of 
input  data,  "signal"  events  and  "background"  events,  where  each  event  is  charac- 
terized by  the  Nin  variables.  A  "perfect"  network  responds  with  znet  near  one  for  a 
signal  input  and  with  znet  near  zero  for  a  background  input  and  a  single  cut  can  be 
made  on  this  network  output  which  will  enhance  the  signal  over  the  background. 

Of  course,  the  key  to  a  good  network  lies  in  the  selection  of  the  input  variables. 
These  variables  must  characterize  the  differences  between  the  signal  and  the  back- 
ground. We  choose  the  first  six  modified  Fox- Wolfram  variables  applied  to  the  jets 
as  network  inputs: 

xi  =  Hi,  x2  =  H2, 
x3  =  H3,  x4  =  H4, 
x5  =  H5,         x6  =  H6. 

The  network  is  trained  on  a  sample  of  4,  000  top-pair  signal  and  3,  814  W+jets 


127 


c 

in 


c 

> 

LU 

.\4 


20% 
18% 
16% 
14% 
12% 
10% 
8% 
6% 
4% 
2% 
0% 


Network  Response 


-P 


(Network- 6  -12  -1  (97)) 

175  GeV  Top  Quark  in  1 .8  TeV  PbarP 
Collisions  After  "zero-level"  Lepton, 
missing  ET  Trigger  and,  Cell  Cuts 


1 — i 


0.025  0.125  0.225  0.325  0.425  0.525  0.625  0.725  0.825  0.925 
Network  Output 


□  Top  Signal  DW+Jets  Background 


Figure  5.13:  Shows  the  network  response,  znet,  for  the  sample  of  signal  and  back- 
ground events  used  in  the  training.  The  plot  corresponds  to  the  percentage  of 
events  with  znet  within  a  0.5  bin  for  the  top-pair  signal  and  the  W+jets  back- 
ground. The  events  have  survived  the  "zero-level"  lepton  plus  missing  ET  trigger 
and  the  calorimeter  cell  cuts. 


background  events  using  the  six  inputs  shown  above  and  where  both  signal  and 
background  events  have  already  satisfied  the  lepton  and  missing  ET  trigger  and 
the  calorimeter  cell  cuts.  To  get  this  training  sample,  it  was  necessary  to  generate 
50,000  top-pair  events  and  1,200,000  W+jet  events. 

As  it  was  discussed  in  chapter  3,  there  is  no  systematic  procedure  that  provides 
the  best  network  topology  for  a  given  problem.  One  looks  for  the  simplest  network 
that  can  discriminate  signal  from  background.  Here  we  use  a  simple  network  with 
only  one  "hidden  layer".  We  use  a  6-12-1  net  which  has  97  memory  parameters. 
Figure  5.13  shows  the  network  response  (i.e.,  znet)  for  the  sample  of  signal  and 
background  events  used  in  the  training.  The  situation  is  far  from  the  ideal.  There 


128 


Figure  5.14:  Shows  the  enhancement  versus  the  efficiency  for  the  training  sample 
of  events  for  a  6-12-1  neural  network  with  97  memory  parameters.  Each  point  in 
the  plot  corresponds  to  a  different  choice  for  the  network  cut-off  with  the  lower 
efficiencies  and  higher  enhancements  corresponding  to  larger  values  zcut.  The  net- 
work enhancements  are  compared  with  the  enhancements  arrived  by  the  use  of 
Fisher  discriminates. 


are  some  events  around  znet  —  0.5  for  which  the  net  cannot  distinguish  between 
signal  and  background.  Nevertheless,  the  net  does  allow  for  some  separation  of 
signal  and  background.  The  net  clearly  recognizes  some  events  as  signal  or  back- 
ground, while  for  other  events  there  is  an  overlap  and  the  net  cannot  distinguish 
between  the  two. 

The  next  step  is  to  perform  a  network  cut-off  and  assign  any  event  with  znet  > 
zcut  to  be  signal  and  events  with  znet  <  zcut  to  be  background.  The  enhancement 
and  efficiency  of  the  network  cut-off  depends  on  the  value  chosen  for  zcut,  where 
the  network  enhancement  and  efficiency  are  defined  as  follows: 


129 


ipnet         _       %  of  signal  with  znet>zCut 
enh  %  of  background  with  znet>zcut 

F™}   =  %  of  signal  with  znet  >  zcut. 

The  overall  network  performance  can  be  characterized  by  the  single  curve  of  the 
network  enhancement  versus  the  network  efficiency  shown  in  Figure  5.14.  Each 
point  in  Figure  5.14  corresponds  to  a  different  choice  for  the  network  cut-off  with 
the  lower  efficiencies  and  higher  enhancements  corresponding  to  larger  values  of 
Zc,^  For  example,  a  net  cut  of  zcut  =  0.75  corresponds  to  an  additional  enhance- 
ment of  about  4  with  a  relative  efficiency  of  about  47%. 

5.9    Fisher  Discriminates 

Fisher  discriminant  offers  more  straight-forward  approach,  and  yet,  in  this  case, 
offers  equally  as  good  a  result  as  that  of  neural  networks.  As  before,  we  want  to 
find  the  discriminant  function,  F,  as  a  linear  function  of  the  inputs 

Nin 

F  =  ^To^i, 
i=i 

such  that  the  Fisher  coefficients,  a,  are  chosen  to  maximize  the  separation  between 
signal  and  background  in  F-space, 

where  fx?  and  op  are  the  mean  and  the  standard  deviation,  respectively,  of  the 
Fisher  output  for  the  signal  (sig)  and  background  (bak)  sample.  The  Fisher  coef- 
ficients are  given  by 


130 


Fisher  Response 


30%  -r 

e 

CD 

25%  -• 

O 

20%  -• 

o 

e 

15%  -• 

« 

c 

10%  -- 

> 

ID 

5%  -• 

0%  -- 

175  GeV  Top  Quark  in  1 .8  TeV  PbarP 
Collisions  After  "zero-level"  Lepton, 
missing  ET  Trigger,  and  Cell  Cuts 

cut 


^3- 


ri  n  n 


^4- 


ml 


5L 


1 — I — I 


0.025  0.125  0.225  0.325  0.425  0.525  0.625  0.725  0.825  0.925 
Fisher  Output 


oTop  Signal  DW+ Jets  Background 


Figure  5.15:  Shows  the  "shifted"  Fisher  response,  F,  for  the  sample  of  signal  and 
background  events  used  in  the  training  of  the  neural  network.  The  plot  corresponds 
to  the  percentage  of  events  with  F  within  a  0.05  bin  for  the  top-pair  signal  and 
the  iy+jets  background.  The  events  have  survived  the  "zero-level"  lepton  plus 
missing  ET  trigger  and  the  calorimeter  cell  cuts.  The  position  of  our  "Fisher  cut" 
is  marked  by  the  dotted  line. 


where  (Vsig  +  Vbak)  1  is  the  inverse  matrix  and  ^i  is  the  mean  of  the  distribution 

In  this  case  training  consists  of  calculating  the  Fisher  coefficients  which  involves 
inverting  an  Nin  x  Nin  matrix,  but  is  easier  than  training  a  network.  Once  this  is 
done  the  situation  is  similar  to  the  network  (with  F  replacing  znet).  For  each  input 
of  Nin  variables  there  is  one  output  F.  We  have  determined  the  Fisher  coefficients 
for  the  sample  of  signal  and  background  events  used  to  train  our  network  and  the 
Fisher  response  for  these  events  is  shown  in  Figure  5.15.  In  plotting  the  Fisher 


131 


Reconstructed  Top-Pair  Invariant  Mass 


1 75  GeV  Top  Quark  in  1 .8  TeV 
PbarP  Collisions  After  "zero-level" 
Lepton,  missing  ET  Trigger,  Cell 
Cuts  and,  Fisher  Cut 


=*=i — i 


IT) 


CM 
GO 


LD 
C-J 
CD 


Invariant  Mass  (GeV) 


a  W+ Jets  Background  □  Top  Signal 


Figure  5.16:  Shows  the  reconstructed  top-pair  invariant  mass,  Mtj,  for  175  GeV  top 
quarks  produced  in  1.8  TeV  p-p  collisions  together  with  the  W  +  jets  background. 
The  plot  shows  the  sum  of  the  signal  plus  background  and  corresponds  to  the 
number  of  events  per  year  (with  Lum=100/pb)  in  a  50  GeV.  the  events  have 
survived  the  "zero-level"  lepton  plus  missing  ET  trigger,  the  calorimeter  cell  cuts, 
and  the  Fisher  cut-off. 


response  in  Figure  5.15,  we  have  shifted,  F,  to  lie  between  zero  and  one  as  follows: 

F  —  F 

■*■  •*  mm 


F  = 


F      -  F 

1  max       *  mm 


In  this  analysis  all  the  inputs,  xi:  lie  between  zero  and  one  and  all  the  Fisher 
coefficients,  a^,  turn  out  negative  which  implies  that 


=  0        and        Fmin  =  -^2 


a,. 


The  separation  between  signal  and  background  in  Figure  5.15  is  about  the 
same  as  the  network.  As  with  the  network,  the  overall  Fisher  performance  can 
be  characterized  by  the  single  curve  of  the  Fisher  enhancement  versus  the  Fisher 


132 


efficiency  which  is  shown  in  Figure  5.14  together  with  the  network  performance. 
Each  point  corresponds  to  a  different  choice  for  the  Fisher  cut-off. 

Figure  5.14  shown  that  Fisher  discriminates  have  essentially  the  same  perfor- 
mance curve  as  does  the  neural  network  and  since  it  is  simpler  to  calculate  the 
Fisher  function,  we  complete  our  analysis  by  making  a  cut  on  F  as  follows: 

F  >  0.75. 

As  can  be  seen  in  Table  5.1,  this  Fisher  cut  provides  an  additional  enhancement 
of  around  4  with  a  relative  efficiency  of  about  44%  resulting  in  an  overall  signal  to 
background  ratio  of  about  9.  Figure  5.16  shows  the  reconstructed  parton-parton 
CM  energy  for  the  top-pair  signal  and  the  W+jets  background  after  the  "zero 
level"  lepton  plus  missing  Et  trigger  and  the  calorimeter  cell  cuts  and  the  Fisher 
cut.  The  plot  shows  the  sum  of  the  signal  and  the  background. 


CHAPTER  6 
SUMMARY  AND  CONCLUSION 

6.1    Neural  Analysis 

We  have  investigated  the  application  of  Artificial  Intelligence  in  High  Energy 
Physics  and  discussed  its  advantages  over  the  conventional  techniques  in  data 
analysis.  We  presented  two  case  studies  by  applying  these  methods  in  the  search 
for  the  Higgs  boson  and  the  Top  quark.  In  both  cases,  we  demonstrated  the 
limitation  of  the  conventional  methods  of  event  classification  which  applies  various 
cuts  in  observed  kinematical  variables  independently.  We  showed  that  using  Neural 
Networks  in  addition  to  normal  cuts  gives  us  an  additional  enhancement  on  the 
number  of  signal  events  to  that  of  the  background  events.  We  could  have  applied 
the  Neural  Networks  technique  directly  to  the  raw  data  without  preprocessing  it 
with  normal  cuts.  We,  however,  decided  to  apply  the  initial  cuts  in  order  to  be 
certain  that  those  physical  attributes  that  signify  the  signal  process  are  included 
in  the  event  selections.  In  addition,  we  showed  that  Fisher  discriminants  can  be 
useful  if  the  variable  space  is  relatively  simple,  and  it  falls  considerably  short  if 
the  variable  space  is  more  complex.  The  Fisher  discriminant  method  performed  as 
well  as  Neural  Networks  in  the  analysis  of  the  Top  data,  yet  it  performed  poorly 
compared  to  Feed-forward  Neural  Networks  in  the  analysis  of  the  Higgs  data. 

We  can  summarize  the  advantages  of  Feed-forward  Neural  Networks  as  follows 

•  Their  algorithm  is  inherently  parallel. 


133 


134 


•  They  are  trainable. 

•  They  can  solve  high-dimensional  problems. 

•  They  offer  an  automated  classification  function. 

•  They  are  error  tolerant  and  can  handle  noisy  data. 

•  They  are  flexible  to  the  choice  of  network  topology. 

In  particular,  we  used  Feed-forward  neural  networks  to  enhance  the  Higgs  boson 
and  the  top  quark  signals  and  their  results  are  summarized  in  the  next  two  sections. 

6.2    Higgs  Data  Analysis 

As  a  first  real  application,  we  applied  Neural  Networks  in  Higgs  boson  phe- 
nomenology. We  demonstrated  that  neural  networks  are  useful  tool  in  identifying 
the  Higgs  boson  processes.  Using  observables  that  measure  how  transverse  en- 
ergy and  mass,  respectively,  are  distributed  around  the  away-side  jet-jet  system, 
a  neural  network  can  help  to  distinguish  the  two  jet  system  originating  from  the 
qq  decay  of  a  color  singlet  Z  boson  from  a  random  jet-pair  coming  from  the  "or- 
dinary" QCD  gluon  bremsstrahlung  of  colored  quarks  and  gluons.  We  have  used 
the  neural  network  in  conjunction  with  the  standard  Higgs  boson  cuts  to  provide 
additional  signal  to  background  enhancements.  Our  procedure  can  be  summarized 
by  the  following  series  of  selections  and  cuts: 

•  Lepton  pair  trigger. 

•  Jet-pair  selection. 

•  Jet-jet  profile  cuts. 

•  Jet-jet  invariant  mass  cuts. 


135 


•  Neural  network  cut-off 

The  invariant  mass  of  the  jet-pair  is  used  only  in  the  selection  of  events.  The 
Higgs  mass  is  reconstructed  from  the  momentum  of  the  jet-pair  with  Mjj  set  equal 
to  Mz.  We  are  able  to  obtain  an  overall  signal  to  background  enhancement  of 
around  10  with  the  standard  Higgs  boson  cuts.  The  neural  network  provides  an 
additional  enhancement  of  4-5  beyond  what  can  be  achieved  with  the  standard  data 
cuts  resulting  in  an  overall  enhancement  of  about  50.  We  believe  that  we  could 
further  improve  the  network  performance  by  using  larger  training  samples  and  by 
increasing  the  number  of  input  variables  to  include  additional  global  information 
such  as  the  number  of  forward  jets  in  the  event. 

Our  method  works  even  with  a  large  number  of  interactions  per  beam  crossing. 
This  shows  that  some  jet  physics  can  be  done  even  in  the  large  pile-up  environment 
of  the  LHC.  Although  it  is  very  difficult  to  make  a  detailed  simulation,  experiments 
at  the  LHC  should  be  able  to  do  as  well  or  better  than  our  analysis.  Furthermore, 
our  procedure  can  be  applied  to  W  bosons  and  should  help  enhance  the  Higgs-4 
WW  — >•  Ivjj  signal  at  hadron  colliders  as  well. 

6.3    Top  Data  Analysis 

We  have  developed  a  procedure  that  enhances  the  top  quark  signal  (£vbbqq 
decay  mode)  over  the  W+jets  and  the  65+jets  background  in  hadron-hadron  col- 
lisions. Our  technique  can  be  summarized  by  the  following  series  of  selections  and 
cuts: 

•  Lepton  and  missing  transverse  energy  trigger. 

•  Calorimeter  cell  cuts. 

•  Modified  Fox- Wolfram  shape  parameters  applied  to  the  jets. 


136 


•  Fisher  discriminate  or  neural  network  cut-off. 

We  do  not  use  a  conventional  "jet  trigger" .  Instead  of  requiring  at  least  four  jets 
in  the  event,  we  find  that  it  is  faster  and  better  to  simply  cut  on  the  number  of 
calorimeter  cells  with  transverse  energy  greater  than  some  minimum.  Our  NcM  >  8 
with  Efl\mm)  =  5GeV  cut  produce  more  than  a  factor  of  three. 

In  addition,  we  use  Neural  networks  and  Fisher  discriminates  in  conjunction 
with  modified  Fox- Wolfram  "shape"  variables  to  further  distinguish  the  top-pair 
signal  from  background.  For  example,  using  the  first  six  Fox- Wolfram  moments 
(applied  to  the  jets)  together  with  a  Fisher  cut-off,  F  >  0.75,  provides  an  additional 
enhancement  of  around  4  with  a  relative  efficiency  of  about  44%.  By  combining 
the  calorimeter  cell  cuts  with  the  event  shape  information,  we  are  able  to  obtain 
an  overall  signal  to  background  enhancement  of  around  370  with  efficiency  of  30%, 
and  a  signal  to  background  ratio  of  around  nine. 

6.4    Final  Remarks 

Artificial  neural  networks  have  proven  to  be  valuable  tools  in  High  Energy 
Physics  research.  We  anticipate  that  investigation  of  the  neural  algorithms  and 
their  application  in  the  form  of  software  and  hardware  continue  to  grow  and  become 
a  active  field  of  research  on  its  own  right.  High  Energy  Physics,  in  particular, 
should  benefit  a  great  deal  from  this  area  of  research  due  to  the  fact  that  it  often 
encounters  classification  problem  in  a  complex  multidimensional  space  for  which 
conventional  approaches  is  not  sufficient.  We  hope  that  we  have  made  strong  case 
in  favor  of  Artificial  Neural  Networks  and  encourage  further  investigation  of  their 
application  in  High  Energy  Physics. 


APPENDIX  A 
H,  W±,  AND  Z  DECAY  MODES 


In  this  chapter  various  decay  modes  of  the  Standard  Model  Higgs  boson,  W 
and  Z  vector  bosons  and  the  top  quark  are  studied  and  the  corresponding  decay 
rates  and  branching  ratios  are  calculated. 

A.l    Z  Decay  Into  Lepton  Pair 

This  Decay  process  is  calculated  in  full  detail  starting  from  the  interaction 
Lagrangian  involving  Z  and  it.  In  the  remaining  sections  we  shorten  the  procedure 
by  using  the  Feynman  rule  and  writing  the  invariant  amplitude  directly.  The 
relevant  portion  of  the  Lagrangian  in  the  Weyl  representation  is 


cos  9W  sin  6U 
In  Dirac  notation  Cint  is 


[-\e[a^eL  +  sin2  6W  (JLo*eL  +  e]Ro»eRf) 


Cint  =  ^-Z^eY  (1-7.-4  sin2  dw)  e, 

4cosy„;  v  ' 

6 

where  g  =  — .  We  want  to  calculate  the  transition  amplitude  from  the  initial 

sin#„, 

state  \Z\(k))  to  the  final  state  I e~(pi),e£(p2 )) 


■fi  =  \f 

,  where 


|t)  =  \Zx(k))  r  a{(k)\0), 
137 


138 


|/>  =  |er>i),ei(p2)>  =  i(p1)<(p2)|0), 
Tfl  =  — [  d^OlcMdMZ.er  (l  -  %  -  4sin2^)  ea[(k)\0). 
The  field  operators  are 

e(x)  =  J2    A  (cs(p)us(p)e-™p  +  dl(p)vs(p)elxp)  , 

w  y/V2E(p)  K 

e(x)  =  £    .    1     x  (ds{p)vs{p)e-lx- p  +  cl(p)us(p)elxp)  , 

^(*)  =  £   /   1       (aA(*)€*i(*)e-*-  +  al(*)e^(fc)^*)  . 
a,*  ^V2u(k) 

It  is  easy  to  see  that  in  Dirac  fields  that  only  terms  involving  c\  and  d\  contribute 
to  the  matrix  element  and  for  neutral  vector  boson  field  only  the  term  involving 
a\  contributes 

Tfi   =  /  d"X  [u(51,Pl)e^17/J(l-75-4sin20 

fl        Acos6wJ  ^V*2El2E22ujk  L  v       '  v 

v(s2:P2)elx^etl(X,k)e-lkx] 
The  space  dependent  part  of  the  integral  is  just  the  delta  function. 

J  d4xe-l{k-pi-p2)  x  =  {2ir)48A{k-Pl-p2), 

hence 

w  

Acos9w   ^V32EX  2E2  2uk 
Let  us  define  invariant  amplitude  M  as 


-iM  =  ^^-eM(A,  A;)u(s1,pi)7/i  (l  -  75  -  4sin20li,)  w(s2,p2)- 

IT  I2 

The  transition  per  unit  volume  per  unit  time  is  Wji  =  j^r 

"      TV         l«  2£,  2E2  21*  1 


139 

TV 

Here  the  relation  \S4(k  -  px  —  p2 ) 1 2  =  ln  XA^4(fc  -  Pi  -  P2)  is  used.    To  get  the 

(2tt)4 

decay  rate,  dT.,  we  should  integrate  Wfi  over  all  momentum  states  allowed 
for  the  final  particles,  ^§  ,  and  divide  by  the  density  of  the  decaying  particles, 
p  =  £  ,  where  in  normalization  condition  we  have  one  particle  per  unit  volume 

,r  =  (2n)W(k-Pl-p2)       2  Vd*Pl  Vd*p2 
V*2E12E22cuk±  (2tt)3  (2tt)3  ' 


dT  =  {2nf8\k-pl  -p2) 


|M|2     d3Pl  d3p2 


2wfc   2£1(2tt)3  2£2(2tt)3' 


Here  u;fc  =  \j\k\2  +  M2  is  the  energy  of  the  vector  boson,  Z,  and  in  the  rest  frame 
of  the  Z  we  have  cok  —  Mz.  We  can  generalize  the  above  expression  as  follows. 
For  decay  of  a  particle  into  N  final  particles,  the  differential  decay  rate  is  : 

— * 

where  P  =  (E,P),  and  E  is  the  energy  of  the  decaying  particle.  In  general  T  is 
calculated  at  the  rest  frame  of  the  decaying  particle  in  which  case  E  is  replaced 
by  the  mass,  M.  One  side-note  is  that  for  a  particle  of  mass  M  decaying  into  two 
particles  of  mass  m  and  momentum  pj,  the  total  decay  rate  can  be  shown  to  be 

r=    Pf  \M\2 
8ttM2'  1 

To  calculate  the  decay  rate,  therefore,  we  need  to  calculate  \M\2.  We  have 
~iM  =  A    lQft  ^{^^{suPih11  (l  -7.  -  4sin20J1)Ws2,p2). 

~I  COS  U yj 

For  the  unpolarized  vector  boson  we  can  square  M  and  sum  over  final  spin  states 
and  average  over  the  initial  polarization  of  Z. 

1      Q2  r 
|A1'2   =    316cos2fl    ^  eM(A,fcK(A,fc)  u(Sl,Pl)>f  (l  -  %  -  4sin2 0W)  v(s2,p2) 

w  AS1S2 

v(s2,p2)  (l  +  7,  -4sin2^)7^(si,p!) 


140 


Using  the  completeness  relation 


£e*  (A,fc)  e„(X,k)  =  -g^  + 


k^kv 


we  obtain 
\M\2  = 


92 

48  cos2 

92 

48  cos2 

9W 

92 

48  cos2 

9W 

Tr|^M(si,p1)u(s1,p1)7/i  (l  -7s  -4sin2^) 


v(s2,p2)v(s2,p2)  (l  +  75  -4  sin2  6^)7 
+  ^)  Tr{&  +  m)7"  (l  -  7,  "  4  sin2  0W)  &  -  m) 

(l  +  75-4sin2^)71/} 

+  ^r)  Tr{(^i  +  +  Y^  +  Zm)  7*}. 

where  in  the  last  line  we  expanded  the  trace,  and  for  algebraic  convenience  intro- 
duced three  constants,  X,  Y,  and  Z.  These  three  constants  are: 

X   =   2  -  8sin2^  +  16  sin4  ^w 

Y   =  2-8sin2^ 

Z   =   8  sin2  0W  -  16 sin4  9W. 

Using  the  properties  of  gammas  and  trace  relations  listed  in  Appendix  B  we  can 
further  simplify  the  trace  terms 

Tr  [X^-ffaf  -  Y^j%Y%  +  Zm2^^ 
=  4  [X         +  pltf  -  Pl  •  p^"" )  -  iYPlap2(j  ea^  +  Zm2  g> 


The  invariant  amplitude  then  becomes 


\M\S 


12  cos2  ft, 


•9 


k^ku 


x  (pfrf  +  tfrt  -px-P2  sT) 

iYplap20ta^  +  Zm2g^  . 


141 


Pl  =  (E,p)   ,     p2  =  (E,-p)   ,     E  =  ^. 


The  algebra  will  be  greatly  simplified  if  we  go  to  the  rest  frame  of  the  decaying 
particle  where  : 

k  =  (Mz,0) 

Therefore,  k^e0^"  =  0  due  to  the  antisymmetry  of  e  .  For  the  same  reason  we 
have  glivta^l>  =  0.  Now,  using  the  explicit  form  of  the  four  vectors  given  above  we 
get 

M2  =  i2&rJx{p'-<*)  +  2XE2-3Zm2}- 

(j>\-k)  =  E2  +  p),       Pf  =  \fl, 


^9  9  Ml 

E  -  m  =  — - 
4 


m2, 


\M\2  = 


12  cos2  Bw 
The  total  decay  rate  is  given  by: 


[Xp]  +  3XE2  -  3Zm2] 


Pf 


8ttM2 


IMI2, 


r(z  ->  ft) 


Pf 


96n  cos2  6WM2 


Xpj  +  3XE2  -  3Zm2] 


M2 


96ttcos20w  8  \  M2 


1  - 


4m2 


4X   1  - 


m 


M2 


-  3Zm2 


This  is  our  final  result  given  in  terms  of  X,  Z,  and  the  masses.  In  general,  however, 
the  neutral  Z  boson  is  much  heavier  than  leptons  (m/Mz)2  <C  1  which  allows  the 
following  simplifications: 


Pf-E  =  ^- 


r(z^ei)  = 


M7 


96ttcos2^  2 
92MZ 


-X 


967r  cos2  9U 
GfM. 


(l  -  4sin20™  +  8sin40tl)) 


^7=^(l-4sin2^  +  8sin4^) 


142 


g2      G  M2 

where  in  the  last  step  we  used  the  relations  —  =  1 'w  ,    Mw  =  cos  6WMZ  . 

8  v2 

A. 2    Vector  Bosons  Decay  Modes 

In  this  section,  we  use  a  general  approach  to  calculate  tree  level  decay  modes  of 
the  W  and  Z  bosons  in  one  shot.  This  approach  relies  on  the  fact  that  the  vertex 
terms  involving  vector  bosons  are  very  similar.  Decay  of  a  massive  vector  boson 
into  two  fermions  can  be  written  in  the  following  general  form 

(w±z)^fu  ;2, 

with  a  generic  vertex  term  of  the  form 


iA 


(^TrfV  (^l-75)-C4sin2<g, 


r2\  1/2 

where  M  is  the  mass  of  the  gauge  particle,  and  A,  B,  and  C  are  different  parameters 
corresponding  to  different  decay  processes.  We  begin  by  writing  M.  ,  the  invariant 
amplitude 


-iM  =     (k,  A)  u(pi,si) 
Let  us  introduce  the  notation 


iA 

(G}M2\ 

V2 

K  V2  ) 

V(P2,S2). 


Sill      Oyj    3y  . 


Squaring  M  and  summing  over  the  final  spin  states  and  averaging  over  the  initial 
polarization  state  of  the  vector  particle,  we  get 


\M\' 


1  A2  ( M2G  \ 

3  T  ^      XK{K  x)u  iPu  Sl)  7" {B{1  ~  7j  "  4CXw) 

\       »  /  AS1S2 

v(j>2,  s2)v{p2,  s2)  (B(l  +  75)  -  4Cxw)  i'u  (pu  si) 


143 


^u(pi,si)u(pi,si)7" 


SlS2 


A2M2Gf(  k.kA 

(B  (1  -  75)  -  4Cxw)  vfa,  s2)v(p2,  s2)  (5(1  +  7s)  -  4Cxw)  <y" 
A2  M2Gf 


6  y/2 


-9nv  + 


M2  ) 


K  +  m^r  (B(l-%)-4Cxw) 
-ma)  (B(l  +  %)-4Cxw)Y 

=  T^f1  ("^  +  a^)  Tr  ^  +  mih"  ^2  +  YA%  +  Zm2^  ^  ' 

where  in  the  last  line  we  expanded  the  trace,  and  for  algebraic  convenience  intro- 
duced three  constants,  X,  Y,  and  Z  which  stand  for 

X   =   2B2  -  8BCxw  +  l6C2xl, 
Y   =  2B2-8BCxw, 
Z  =  8BCxw-16C2x2w. 

Using  the  properties  of  the  gammas  and  trace  relations  listed  in  the  Appendix  B 
we  can  further  simplify  the  trace  terms 

Tr  [X^j%Y  -  Y^j%Y%  +  Zm^m^Y 

=  4  [X  (pW2  +  V\vl  ~  Pi  ■  P2<T )  -  iYpiaPy,  ^v 

+Zm1m2  g^}. 


The  invariant  amplitude  then  becomes 


\M\> 


A'- 


2M2G 


I 


-9n»  + 


3    y/2    \  M2 
~Pi  ■  P2  <T )  -  iYplaP20  ea^  +  Zmxm2g^ 


The  algebra  will  be  greatly  simplified  if  we  go  to  the  rest  frame  of  the  decaying 
particle  where 


k  =  (M,  0)       Pl  =  (Eup)       p2  =  (E2,  -p). 


144 


Therefore,  k^kvta,ll}v  =  0  due  to  the  antisymmetry  of  the  e  .  For  the  same  reason 
we  have  gtivea'x0u  —  0.  Now,  using  the  explicit  form  of  the  four  vectors  given  above 
we  get 

2  M2C 

\M\2  =  A2-^J-[X{Pl  ■  p2)  +  2XEXE2  -  ZZmxm2\ 
o  y2 

(Pl-p2)  =  E,E2+p2f   ,        Pf  =  \p\ 
\M\2  =  A2™^f  [Xp)  +  ?>XEXE2  -  3Zm,m2 
The  total  decay  rate  is  given  by: 


Pf 


8irM2 


\M\2, 


T  =  A2  fyzpf  \Xp)  +  ZXExEi  -  ZZmxm2\  . 

\2~K\J2  L 

The  above  expression  is  our  final  result  for  the  decay  width  of  a  vector  boson 
into  two  fermions.  We  could  modify  it  ,if  we  wish,  by  writing,  Ei,E2  ,  and  pf 
in  terms  of  known  masses.  Using  the  invariant  relations  (k  —  p\)2  =  p\  and 
(k  —  p2)2  =  p\    we  can  find: 


M2  +  m\-  m\ 
2M 


E, 


M2  +  m\-  m\ 


2M 


E\E2  — 


I 


4 

To  get  pj,  we  use  E\  in  p2  —  E\  —  m\: 

4 

M2 


ml  —  2 
M2  i 


1  - 


m\  +  m\  \  2 
M2  J 


1  - 


("'!  +  rn2y 
M2 


1 


2m\m2  V 
M2  ) 

{mi  -  m2y 


M2 


Of  course  the  constants  X  and  Z  as  defined  earlier  are: 

X    =   2B2  -  8BCxw  +  16C2x2„ 


Z    =  SBCx, 


16C2x2 


sin  0„ 


145 


In  general  W  and  Z  are  much  heavier  than  fermions  (rrif/M)2  <  1  which  allows 
the  following  simplifications 

M 

Ex  ~  E2  ~  pf  c-  y, 


r  =  A^«^V 


12tt>/2      V  2  7 
=   j^9jM-—  (b2  -  4BCxw  +  8C2a;^) 

=   A^^lMl(B2  -ABC  sin2  6  U)  +  8C2sin40 
127r\/2  v  ; 

Let  us  apply  the  above  result  to  some  specific  cases.  In  each  case  from  the 

vertex  term  we  can  easily  read  off  the  constants  A  B,  and  C  and  the  corresponding 

T  is  readily  obtained. 

1.  W  — ►  tut 

The  vertex  term  for  this  process  is  : 

A  =  \/2    ,     £  =  1    ,     C  =  0    ,     M  =  M„, 

r(„,)  =  «zM 

V  '  6\/27r 

2.  W  — ►  ude 

In  this  case  a  factor  of  3  should  be  included  in  the  calculation  of  T  to 
account  for  the  three  color  state  of  the  quarks. 

A  =  y/2    ,     B  =  l    ,     C  =  0    ,     M  =  MW 


146 


r  (w  ->  ud,)  =  3^ 


3.  Z  — >vu 


4.  Z 


-i  (M2ZG}\ 
V2  {  V2  ) 


1/2 


r  (i  -  7.) 


A  =  l 


B  =  1 


C  =  0 


M  =  M, 


,  G,M23 


il 


V2 
A=  1 


C  =  1    ,    M  =  Mz 


B  =  1 


^S-  ( 1  -  4 sin2  9W  +  8 sin4  flj  (A.l) 


5.  Z  — ► 


^hd-75)-4Q9sin2^] 

A  =  l    ,     5  =  1    ,     C=2-    ,     M  =  M2 

We  include  a  factor  of  3  for  color. 

*GfMl  /      8        32  2\ 

nGrMl  /      8  .  2/1      32  .  4„  \ 


1/2 


6.  Z  — >•  dd 


-t  (M2zGjV12 


7 


r3(l  -7S)  -4Q,sin20u 


-1 


A  =  l    ,     5  =  -l    ,     C=y    ,     M  =  MZ 


(A.2) 


147 


Including  a  factor  of  3  for  color. 


=  SH^M  (A-3) 

A. 3    Higgs  Boson  Decay  Modes 

H  — >■  tl 

Where  ii  stand  for  a  pair  of  lepton  anti-lepton.The  vertex  term  is  —imj  ( G/v2J 
and  the  invariant  amplitude,  A4,  is: 

1  /2 

-iM  =  -imj  (G/v^)    u„  (pi)  wS2  (P2) 
Squaring  .M  and  summing  over  all  final  spin  states: 


/  1  /2\ 

=    (G/mfv^)  rr{^uSl(pi)uSl(p!)i;S2(p2)uS2(p2)} 

=  (Gfm*y/2)  Tr  (^2  -  m?) 
=  (G/mfv7^)  4  (Pl  -p2-mf) 
=  2  (M2  -  4m?)  , 


(A.4) 


where  in  the  last  step  we  used  the  invariant  relation  P%  =  {p\  +  p2)2  and  solved 
forpi  •  p2  —  Ml  —  2mf  .  Using  the  differential  decay  rate  given  in  Appendix  B  we 
can  write 


dY  = 


1  Pf 

32tt2  Ml 


\M\2dVt, 


148 


where  pj  =  =  IP2I  is  the  final  momentum  of  the  decaying  particles.  We  can 
write  pj  in  terms  of  masses  using  the  invariant  relation  2pi  ■  p2  =  M\  -  2mf  . 
Writing  pi  ■  p2  explicitly  we  have. 

Pi-p2   =  E1E2-pvP2 

■  ft)'** 

Here  we  used  the  fact  that  for  a  pair  of  11  in  the  CM.  frame  Ex  =  E2  =  Mh/2  , 
and  p\  =  —  p2-  Next, 

m, 


Pf  =  l(M2h-4mf)1/2. 
Substituting  for  pf  in  the  decay  rate  we  get 

Integrating  over  Q  : 

1  (GfmfV2)  t  9X3/2 
=    _LV  — '-(Ml -ton!) 
8tt      Ml       V    h  l) 

r(H^ll)   =   ^(G/^M,)(l-^)3/2.  (A.5) 

H—*vv 

For  decay  of  a  Higgs  into  a  pair  of  massive  neutrinos,   T   is  given  by  above 
term.  Of  course,  at  the  limit  of  mv  ->  0  ,  T  vanishes. 
— >qg 

For  decay  of  a  Higgs  boson  into  a  pair  of  quarks,  the  vertex  function  is  the 

1  /2 

same  as  that  of  lepton  pairs,  that  is  -imQ  (G/\/2)  -The  only  difference  is  that 
quarks  have  an  additional  quantum  state,  color  state,  which  T  should  be  summed 
over.  This  introduces  a  factor  of  three  and  the  decay  rate  becomes 


r(jf-m)  =  -£-(G/m}J4)(i-_ 


2 


149 


H  — >  W+  W~ 

The  vertex  term  is  given  by  2iM2(y/2Gf)1/2ga0  .  Assuming  the  Higgs  boson 
is  heavier  than  2MW,  we  have  for  invariant  amplitude,  M  : 

-iM   =    (igMwg^)e*a(X1,k+)e;(X2,k.)  (A.6) 
=  {igMw)  cAl  (k+)  ■  e*Xi  (fc_)  (A.7) 

Squaring  M  and  summing  over  the  polarization  states  of  the  vector  particles 

\M\2  =  E  (92M2W)  4  (Ai,  k+)  e0  (Xuk+)  ^  (A2,  fc_)  e0  (A2,  fc_) 

AiA2 

Using  the  completeness  relation 

hyku 


J2el  (A'  k)  ^  (A> /c)  =  -9v»  +  v/: 


we  find 


=  (A-9) 

where  in  the  last  step  we  used  k\  =  k2_  =  M2  .Furthermore,  we  can  expand  the 
invariant  Pft2  =  (k+  +  k_f  to  get  (k+  ■       =  \  [Ml  -  2M2) 

\M?  =  «(2  +  M__J^)  (A.11) 


The  total  rate  for  two  body  decay  as  given  in  the  Appendix  B  is: 

r  =  --^|M|2 


150 


where  pf  is  the  final  momentum  of  the  W's,  and  as  we  did  earlier,it  can  be 
shown  to  be  equal  to  \  {Ml  -  4M2)1/2 


1  {M2h-4M2w)l/2g2M* 


8tt 


2MI 


4M2, 


M2  M4> 
4^  +  12  w 


Ml 


1  g2M\ 
8tt8M2  \ 

GfMl 
8ttv/2  \ 


4A^ 
Ml 


M2  M4~ 
0  +  12  » 


Ml 


Ml 


1  4M2 
M2 


M2  M4 
1-44%  +  12 


M2 


(A.13) 
(A.14) 

(A.15) 


H 


ZZ 


For  this  case  we  can  use  the  results  obtained  for  H  decay  into  a  pair  of  W's 

1  /2 

since  in  both  cases  the  vertex  term  is  2i  (v^G/)  Mfz  w)ga(}  depending  on  the 
final  product.  So  all  we  have  to  do  is  to  replace  all  M^'s  in  the  previous  result 
by  Mz  to  get  T{H  ->■  ZZ)  In  this  case,  however,  we  have  to  consider  the 
fact  that  the  final  state  involves  two  identical  particles,  so  a  factor  of  1/2  should 
be  included. 

Ml 


T{H  ->  ZZ)  = 


167rv/2\ 


AM2 


APPENDIX  B 
NOTATIONS  AND  CONVENTIONS 


Some  useful  relations  among  GF,  Mz,  Mw,  9W,  v,  andp  : 

„       Mw  vg  g2  G}Ml 

Mz  =  —     ,       Mw  =  —     ,       —  =  —7=- 

cosvw  2,  a  v2 

v=(Gfy/2)  ,        A=y^    '      9sm6w  =  e 

Gf  =  1.16632  x  10~5GeV-2  sin2  0W  =  .2325 


B.l    Feynman  Rules 

For  a  Dirac  particle  we  have  u  (p,  s)  (incoming)  and  u  (p,  s)  (outgoing)  and  for 
its  antiparticle  v  (p,  s)  (outgoing)  and  v  (p,  s)  (incoming)  with 


{jf>  —  m)u  —  u  —  m)  =  0 
(jf>  +  m)  v   =   v     +  m)  =  0 

it  (p,  sz  =  1/2)  = 
u(p,sz  =  -1/2)  = 
w(p,*,  =  l/2)  = 

151 


E+m 


152 


v(p,sz  = -1/2)    =    V£  +  m(  W 

^  E+m  |0. 


u  (p,  s)  u  {p,  s)  =  (^>  +  m)^(l+jJ) 
v(p,s)v(p,s)    =       -  m)  ^  (1  +  -y6f) 


u  (p,  s)  u  {p,  s)  =  i>  +  m 

s 

Y,v{p,s)v(p,s)  =  j>-m 

s 

u  (p,  s)  u  (p,  s')  =  2mSssi 

v  (p,  s)  v  (p,  s')  =  -2m5ss> 


The  propagators  are 
For  a  scalar 

For  a  Dirac  particle 

For  a  vector 


q2  —  m2 


—  m 


gnv  _  q»qv/m2 
q2  —  m2 


-%■ 


B.2    Decay  Rate 

For  decay  of  a  particle  into  N  final  particles,  the  differential  decay  rate  in  the 
rest  frame  of  the  decaying  particle  is  : 


153 


Where  P  =  (E,  p)  ,  and  E  is  the  energy  of  the  decaying  particle.  In  general  T 
is  calculated  at  the  rest  frame  of  the  decaying  particle  in  which  case  E  is  replaced 
by  the  mass,  M.  For  a  particle  of  mass  M  decaying  into  two  particles  of  mass  m 
and  momentum  pj  the  total  decay  rate  can  be  shown  to  be 

8ttM2'  1 

For  scattering  of  two  particles  of  momenta  Pi  and  P2  into  N  final  particles,  the 
differential  cross  section  is  : 

da  =  (2tt)4         Pf-Pi-  P,)AW\     x-n\M?  ft  (B.1) 
V  f  /  4EiE2\V2  -  Vi\        i=i  (2tt)  2Ei 

The  total  decay  rates  and  total  cross  sections  can  be  obtained  by  integrating 
over  all  final  states  and  summing  over  all  final  spin  states.  Note,  however,  that 
if  the  final  state  containes  r  identical  particles  we  should  include  a  factor  of  l/r\ 
in  above  above  expresions  to  avoid  overcounting.  If  there  are  A;  sets  of  identical 
particles  in  the  final  state  with  rx,  r2,  r3,...,  rk  representing  the  number  of  particles 
in  each  set,  then  the  statistical  fator  required  is  1/  FI  ri 

For  two  particle  decay  rate  in  the  rest  frame  of  the  particle  we  can  show: 

r  =  ss^i2  <B-3» 

where  M  is  the  mass  of  the  decaying  particle,  and  Py  is  the  momentum  of  the 
final  particles. 

The  differential  and  total  cross  section  for  the  process  AB  — >  CD  in  the  center 
of  mass  frame  are: 

da{c-m-]  =  ^w^M^  (B-4) 

(B.6) 


154 

Where  W  =  {EA  +  EBf ,  \PA\  =  \PB\  =  Pi  and  \PC\  =  \PD\  =  Ps 

B.3    Traces  And  Contraction  Identities 

Using  the  anticommutation  relation  between  7  s  we  can  show 

IpY  =  4  ,         irf-f  =  -27° 

7„7Q7V  =  ^      ,       7^7Q7/5777"  =  -2777V 
7M7<V777<V  =  2(7*7° W  +  777/37Q7<5) 
If  A,  B,  C,and  D  denote  four-vectors  then: 

7^7"  =  -24 
=  AA-  B 

ltM0$>r  =  2 

The  totally  antisymmetric  Levi-Civita  tensor: 

{+1    if  (p,    p,  a)  is  an  even  permutation  of  (0, 1, 2, 3) 
—  1    if  (p,  v,  p,  <r)  is  an  odd  permutation  of  (0, 1,  2,  3) 
0  otherwise 

'-p.i/pa  r 

The  antisymmetric  tensor  eQ/3/il/  satisfies  the  following  contraction  identities. 


a   =    p,  v,  p,  a 


a'   =   p,  v' ,  p',  a' 


e""^  u'p'a'  =  -det  (gaa')  a   =    u,  p,  a 


a'    =    is',  p',  a' 


155 


ea^ea0,u    =  -24 

For  a  general  D-dimentional  space  where  D  is  an  integer  number  we  have  D  7- 
matrices  70,  71,  72,  ...  ,7D-1  ,  which  satisfy  the  usual  anticommutation  relation. 

{7",  7"}   =   2gTI  (B.7) 
W   =   D  (B.8) 

where  /  here  is  a  D  x  D  unit  matrix. 

Tr{I)  =  D 

TV  (7")  =  0 

Tr(%)  =  0 

Tr(oddj)  =  0 

7Y(yy)  =  DgT 

Tr(%-f)  =  0 

Tr(a'lu)  =  0 

Tr(%j^)  =  0 

Tr(%rYla)  =  0 


=  DI 

=  -(D-2)ja 

=    {D  -  4)7Q7/?  +  Aga0 


156 


completeness  relation 

E<(A,fc)e,(A)fc)  =  -^  +  ^,  (B.9) 


REFERENCES 


[I]  L.  Lonbnblad,  CERN-TH.7135  (1994). 

[2]  L.  G.  LI.  Ametller  and  P.  Talavera,  Preprint  UB-ECM-PF  94/13  (1994). 

[3]  P.  A.  et  al.  (DELPHI  Collaboration),  Preprint  CERN-PPE/92-151  (1992). 

[4]  G.  S.-A.  P.  T.  Li.  Ametller,  Li  Garrido  and  P.  Yepes,  Preprint  hep-ph/9603269 
(1996). 

[5]  P.  C.  et  al.,  Phyics  Letters  B  322  number  3,  220  (1994). 

[6]  F.  A.  et  al.  (The  CDF  Collaboration),  Phy.  Rev.  Lett.  73,  225  (1994). 

[7]  S.  A.  et  al.  (The  DO  Collaboration),  Phy.  Rev.  Lett.  74,  2632  (1995). 

[8]  S.  A.  et  al.  (The  DO  Collaboration),  Phy.  Rev.  Lett.  74,  2422  (1995). 

[9]  F.  A.  et  al.  (The  CDF  Collaboration),  Phy.  Rev.  Lett.  74,  2626  (1994). 

[10]  e.  a.  D.  Buskulic,    Measurement  of  the  Ratio  Tb'b/Thad  using  Event  Shape 
Variables,  Phys.  Lett.  B313,  549  (1993). 

[II]  G.  P.  P.  De  Felice,  G.  Nardulli,  Preprint  BARI-TH/199-95  (1995). 

[12]  G.  E.  H.  D.  E.  Rumelhart  and  R.  J.  Williams,  Parallel  Distributed  Processing, 
MIT  Press,  Cambridge,  MA,  1986. 


157 


158 

[13]  G.  E.  H.  D.  E.  Rumelhart  and  R.  J.  Williams,  Learning  representations  by 
back-propagating  errors,  Nature  323,  533  (1986). 

[14]  T.  Cheng  and  L.  Li,  Gauge  Theory  of  Elementary  Particle  Physics,  Oxford 
University  Press,  1991. 

[15]  F.  Halzen  and  A.  Martin,  Quarks  and  Leptons,  Jhon  Wiely  and  Sons,  1984. 

[16]  R.  j.  N.  P.  V.  D.  Barger,  Collider  Physics,  Addison- Wesley  Publishing  Com- 
pany, 1989. 

[17]  O.  Nachtmann,  Elementary  Particle  Physics,  Springer- Verlag,  1989. 

[18]  R.  Schalkoff,  Pattern  Recognition,  Jhon  Wiley  &  Sons,  Inc.,  1992. 

[19]  M.  Caudill  and  C.  Butler,   Undrstanding  Neural  Networks,  Bradford  Book 
MIT  Press,  Cambridge,  1994. 

[20]  M.  H.  Hassoun,  Fundamentals  of  Artificial  Neural  Networks,  Bradford  Book 
MIT  Press,  Cambridge,  1995. 

[21]  P.  N.  N.  R.  in  C++,  Timothy  Masters,  Academic  Press  Inc.,  1993. 

[22]  H.  V.  R.  V.  B.  Rao,  C++  Neural  Networks  and  Fuzzy  Logic,  MIS  press,  1993. 

[23]  F.  W.,  Neural  Computation  6(2),  285  (1994). 

[24]  Z.  H.  G.  Finnoff  W.,  Hergert  F.,  Neural  Networks  6(5),  771  (1993). 

[25]  B.  K.  T.  M.  Heskes,   Learning-parameter  Adjustment  in  Neural  Networks, 
Phys.  Rev.  A  45,  8885  (1992). 

[26]  J.-P.  M.  Nico  Weymaere,  Neural  Networks  4,  361  (1991). 


159 

[27]  M.  F.  Moller,  A  Scaled  Conjugate  Gradient  Algorithm  for  Fast  Supervised 
Learning,  Neural  Networks  6,  525  (1993). 

[28]  R.  P.  Lippmann,  Neural  Network  Classifiers  Estimate  Bayesian  a  posteriori 
probabilities,  Neural  Comput  4,  461  (1991). 

[29]  K.  Fukunaga,  Introduction  to  Statistical  Pattern  Recognition,  Academic  Press 
Inc.,  1990. 

[30]  P.  E.  H.  R.  Duda,  Pattern  classification  and  Scene  Analysis,  Wiley:  New 
York,  1973. 

[31]  F.  B.  F.  F.-S.  P.  Gallinary,  S.  Thiria,  Neural  Networks  4,  349  (1991). 

[32]  R.  N.  Cahn,  A  Higgs  Primer,  Preprint  LBL-29789  (1990). 

[33]  C.  Quigg,  Gauge  Theories  of  the  Strong,  Weak,  and  Electromagnetic  Interac- 
tions, Addison- Wesley  Publishing  Company,  1983. 

[34]  D.  Collaboration,  Preprint  CERN-PPE/94-46/Rev  (1994). 

[35]  R.  D.  Field  and  P.  A.  Griffin,  Phy.  Rev.  D  48,  3167  (1993). 

[36]  M.  T.  P.  A.  G.  R.  D.  Field,  Y.  Kanev,  Phy.  Rev.  D  53,  2296  (1996). 

[37]  M.  T.  R.  D.  Field,  Y.  Kanev,  Institute  for  Fundamental  Theory  Preprint 
UFIFT  HEP-96-23  (1996). 

[38]  G.  C.  Fox  and  S.  Wolfram,  Phy.  Rev.  Lett.  41,  1581  (1978). 

[39]  G.  C.  Fox  and  S.  Wolfram,  Nucl.  Phys.  B149,  413  (1979). 

[40]  G.  C.  Fox  and  S.  Wolfram,  Phys.  Lett.  B82,  134  (1979). 


BIOGRAPHICAL  SKETCH 


Mohammad  Reza  Tayebnejad  was  born  on  September  5,  1960  in  the  small 
town  of  Manjill  in  Iran.  He  came  to  the  United  States  in  1985.  After  two  years 
of  education  in  Santa  Fe  Community  College  he  transferred  to  the  University  of 
Florida  in  Gainesville  where  he  pursued  his  goal  of  studying  in  Physics.  In  the  fall 
of  1989,  he  was  admitted  to  the  graduate  program  in  the  physics  department  at  the 
University  of  Florida.  In  1991,  he  started  his  research  in  high  energy  physics  as  a 
student  member  of  the  Institute  for  Fundamental  Theories  under  the  supervision 
of  Professor  Richard  D.  Field.  He  is  currently  in  the  process  of  completing  his 
Ph.D.  degree  in  physics  specializing  in  high  energy  phenomenology. 


160 


I  certify  that  I  have  read  this  study  and  that  in  my  opinion  it  conforms  to 
acc  p  ^Undards  of  scholarly  presentation  and  is  fully  adequate,  m  scope  and 
^ality,  as  a  dissertation  for  the  degree  of  Doctor  of  Phdosophy. 


Richard  D.  Field,  Chair 
Professor  of  Physics 


I  certify  that  I  have  read  this  stndy  and  that  in  my  opinion  it  conforms  to 
Jp^stanlards  of  scholarly  presentation  and  is  ful ly  adeo^ ate,  m  scope  and 
quality,  as  a  dissertation  for  the  degree  of  Doctor  of  Plulosogh* 


Pierrl^Kamond 
Professor  of  Physics 

I  certify  that  I  have  read  this  study  and  that  in  my  opinion  it  conforms  to 
acc  ^standards  of  sohoiady  presentation  and  to  tally  »  m  scope  and 
quality,  as  a  dissertation  for  the  degree  of  Doctor  of  Phdosophy. 

_Jj=^  

Pierre  Sikivie 

Professor  of  Physics 

I  certify  that  I  have  read  this  stndy  and  that  in  my  opinion  it  conforms  to 
J^aSe  Standards  of  scholarly  presentation  andis  fully  adequate,  m  scope  and 
quality,  as  a  dissertation  for  the  degree  of  Doctp^lnlosophy. 


Paul  Avery 
Professor  of  Physics 


I  certify  that  I  have  read  this  study  and  that  in  my  opinion  it  conforms  to 
acceptable  standards  of  scholarly  presentation  and  is  fully  adequate,  m  scope  and 
quality,  as  a  dissertation  for  the  degree  of  Doctor  of  Phdosophy. 

— 1   


"Paul  Lee  Robinson 
Associate  Professor  of  Mathematics 


This  dissertation  was  submitted  to  the  Graduate  Faculty  of  the  Department  of 
Physics  in  the  College  of  Liberal  Arts  and  Science  and  to  the  Graduate  School  and 
was  accepted  as  partial  fulfillment  of  the  requirments  for  the  degree  of  Doctor  of 
Philosophy. 

May  1997 


Dean,  Graduate  School 


